Force Fields
sci-form uses two force fields in sequence: a Bounds Violation Force Field for the initial embedding stage, and an ETKDG 3D Force Field for the refinement stage. Both are minimized using the BFGS quasi-Newton optimizer.
1. Bounds Violation Force Field
This force field operates in 4D (or 3D) space and penalizes violations of the distance bounds. Its goal is to move atoms from the approximate eigendecomposition coordinates to positions that satisfy all pairwise distance constraints.
Energy Function
Distance Violation Term
For each pair
The basin filter (
Chiral Volume Term
For each chiral center with neighbors
where
4D Penalty Term
This penalizes the 4th dimension, driving coordinates toward a 3D embedding.
Two-Phase Minimization
| Phase | Purpose | ||
|---|---|---|---|
| 1 | 1.0 | 0.1 | Establish correct chirality |
| 2 | 0.2 | 1.0 | Collapse 4th dimension |
Phase 2 only runs when 4D embedding is used (molecule has chiral centers).
BFGS Optimizer (Bounds FF)
The bounds force field uses a full BFGS optimizer with an
= (number of atoms × number of dimensions) - Maximum iterations: 400 per pass
- Force tolerance:
- Line search: Cubic/quadratic interpolation backtracking (matching RDKit)
- Gradient scaling:
, capped at 10.0 per component - Restarts: up to 50 passes of 400 iterations each
The optimizer converges when the maximum gradient component falls below the force tolerance.
2. ETKDG 3D Force Field
After projecting to 3D, this force field refines the geometry using experimentally-derived torsion preferences.
Energy Function
Terms are added in this exact order, matching RDKit's construct3DForceField.
M6 Torsion Terms
For each rotatable bond with matched CSD pattern:
where
UFF Inversion Terms
For SP2 atoms (C, N, O) with 3 neighbors, out-of-plane bending is penalized:
where
for C bonded to SP2 oxygen for all other cases
Three permutations are evaluated for each improper center (cycling through the neighbor triples).
Distance Constraints
| Type | Force Constant | Application |
|---|---|---|
| 1-2 (bond lengths) | 100.0 | All bonded pairs, flat-bottom ±0.01 Å |
| 1-3 (angle distances) | 100.0 | Pairs involving improper centers |
| Long-range | 10.0 | All remaining pairs from bounds matrix |
Flat-bottom form:
Angle Constraints
For SP/linear atoms, constrain the angle to 180°:
BFGS Optimizer (ETKDG 3D)
The 3D force field uses a similar BFGS optimizer:
- Maximum iterations: 300 (single pass)
- Early termination: skip if initial energy <
- No restarts — single optimization pass
Gradient Computation
Both force fields use analytical gradients for efficient optimization. Key gradient formulas:
Distance Term Gradient
For the distance between atoms
Torsion Angle Gradient
The torsion angle
where
Chiral Volume Gradient
For the chiral volume
BFGS Algorithm
Both force fields use the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method:
where:
(step) (gradient change) is the inverse Hessian approximation
Line Search
The line search uses the following strategy:
- Initial step: from the BFGS direction
- Backtracking: cubic/quadratic interpolation
- Sufficient decrease: Armijo condition with
- Maximum step: scaled to not exceed 100.0
3. MMFF94 Force Field
The Merck Molecular Force Field 94 (MMFF94) is used for post-embedding energy refinement and geometry optimization of organic drug-like molecules. sci-form implements the full four-term MMFF94 energy function with analytical gradients.
Atom Types
MMFF94 assigns each heavy atom a chemical context type used to look up force constants:
| Type | Description |
|---|---|
HC | Hydrogen bonded to carbon |
CSp | sp³ carbon |
CSp2 | sp² carbon (non-aromatic) |
CB | Aromatic carbon |
CR | sp³ carbon in ring |
NR | sp³ nitrogen in ring |
N2 | sp² nitrogen |
NAm | Amide nitrogen |
NR2 | Aromatic nitrogen |
NC | Nitrile nitrogen |
OR | sp³ oxygen |
O2 | sp² oxygen |
F | Fluorine |
P | Phosphorus |
S | Sulfur |
Cl | Chlorine |
Br | Bromine |
I | Iodine |
Energy Function
1. Bond Stretching (Quartic Form)
MMFF94 uses a quartic Taylor expansion around the equilibrium bond length
where:
is the bond length deviation (Å) is the bond force constant (md/Å) is the cubic stretch constant - The factor 143.9325 converts md/Å to kcal/mol
The quartic form captures asymmetric bond behaviour: bonds are harder to compress than to stretch.
Analytical gradient (used in sci-form):
2. Angle Bending (Cubic Form)
where:
in degrees is the angle force constant (md·Å/rad²) is the cubic bend constant - The factor 0.043844 converts to kcal/mol
The cubic term models the asymmetry of valence angle deformation near linearity.
Analytical gradient:
3. Torsion (3-Term Fourier)
where
The three Fourier terms model:
: one-fold periodicity (conformational asymmetry) : two-fold periodicity (conjugation effects) : three-fold periodicity (staggered/eclipsed preference)
The dihedral gradient uses numerical central differences (
4. Buffered 14-7 van der Waals (Halgren 1996)
MMFF94 replaces the Lennard-Jones 12-6 potential with Halgren's Buffered 14-7 form, which provides a softer repulsive wall and better treatment of close contacts:
where:
is the interatomic distance is the empirical combined van der Waals radius is the well depth
The 1.07 and 0.07 buffer factors soften the repulsive wall relative to Lennard-Jones 12-6, and the 1.12 / 0.12 attractive factors give a smoother well minimum.
Analytical gradient (Cartesian components):
Implementation: Mmff94Builder
The Mmff94Builder constructs the force field from a parsed molecule:
use sci_form::forcefield::mmff94::Mmff94Builder;
let mol = sci_form::parse("CCO")?;
let mut builder = Mmff94Builder::new(&mol);
builder.build(); // Assigns atom types and creates all energy terms
// Evaluate energy + inject gradients
let mut grad = vec![0.0; mol.atoms.len() * 3];
let energy = builder.total_energy(&coords, &mut grad);Gradient Validation
sci-form provides a validate_gradients() utility that checks analytical gradients against central-difference numerical gradients:
use sci_form::forcefield::mmff94::validate_gradients;
let max_err = validate_gradients(&term, &coords, 1e-5);
assert!(max_err < 1e-4, "gradient error: {max_err}");This is used in the test suite to verify correctness of all four term types.
References
- Halgren, T. A. Merck Molecular Force Field. J. Comput. Chem. 1996, 17, 490–519.
- Halgren, T. A. MMFF VII. Characterization of MMFF94, MMFF94s, and other widely available force fields. J. Comput. Chem. 1999, 20, 730–748.