Skip to content

Force Fields

sci-form uses two force fields in sequence: a Bounds Violation Force Field for the initial embedding stage, and an ETKDG 3D Force Field for the refinement stage. Both are minimized using the BFGS quasi-Newton optimizer.

1. Bounds Violation Force Field

This force field operates in 4D (or 3D) space and penalizes violations of the distance bounds. Its goal is to move atoms from the approximate eigendecomposition coordinates to positions that satisfy all pairwise distance constraints.

Energy Function

Etotal=Edist+wchiralEchiral+w4DE4D

Distance Violation Term

For each pair (i,j) with bounds [lij,uij] and current distance dij, only considering pairs where uijlij5.0 (the "basin" filter):

Edist=i<j{(dij2uij21)2if dij>uij(2lij2lij2+dij21)2if dij<lij0otherwise

The basin filter (ul5.0) skips widely constrained pairs (typically distant non-bonded atoms) to focus optimization on well-determined distances.

Chiral Volume Term

For each chiral center with neighbors (a,b,c,d):

Echiral={(VVupper)2if V>Vupper(VVlower)2if V<Vlower0otherwise

where V=v1(v2×v3) is the signed volume of the tetrahedron formed by the four neighbor atoms, with vk=xnkxn0.

4D Penalty Term

E4D=i=1Nxi42

This penalizes the 4th dimension, driving coordinates toward a 3D embedding.

Two-Phase Minimization

Phasewchiralw4DPurpose
11.00.1Establish correct chirality
20.21.0Collapse 4th dimension

Phase 2 only runs when 4D embedding is used (molecule has chiral centers).

BFGS Optimizer (Bounds FF)

The bounds force field uses a full BFGS optimizer with an Ndim×Ndim inverse Hessian approximation:

  • Ndim = N×d (number of atoms × number of dimensions)
  • Maximum iterations: 400 per pass
  • Force tolerance: 103
  • Line search: Cubic/quadratic interpolation backtracking (matching RDKit)
  • Gradient scaling: ×0.1, capped at 10.0 per component
  • Restarts: up to 50 passes of 400 iterations each

The optimizer converges when the maximum gradient component falls below the force tolerance.

2. ETKDG 3D Force Field

After projecting to 3D, this force field refines the geometry using experimentally-derived torsion preferences.

Energy Function

E3D=Etorsion+Einversion+Edist12+Eangle+Edist13+Elongrange

Terms are added in this exact order, matching RDKit's construct3DForceField.

M6 Torsion Terms

For each rotatable bond with matched CSD pattern:

Etorsion=k=16Vk(1+skcos(kϕ))

where Vk and sk are the Fourier coefficients from the CSD pattern. The cosines are computed using Chebyshev polynomial recurrence (no trigonometric functions beyond the initial cosϕ):

cos(2ϕ)=2cos2ϕ1cos(3ϕ)=4cos3ϕ3cosϕcos(4ϕ)=8cos4ϕ8cos2ϕ+1cos(5ϕ)=16cos5ϕ20cos3ϕ+5cosϕcos(6ϕ)=32cos6ϕ48cos4ϕ+18cos2ϕ1

UFF Inversion Terms

For SP2 atoms (C, N, O) with 3 neighbors, out-of-plane bending is penalized:

Einv=Kbase103perm(1sinY)

where Y is the Wilson out-of-plane angle, and:

  • Kbase=50.0 for C bonded to SP2 oxygen
  • Kbase=6.0 for all other cases

Three permutations are evaluated for each improper center (cycling through the neighbor triples).

Distance Constraints

TypeForce Constant kApplication
1-2 (bond lengths)100.0All bonded pairs, flat-bottom ±0.01 Å
1-3 (angle distances)100.0Pairs involving improper centers
Long-range10.0All remaining pairs from bounds matrix

Flat-bottom form:

E=k2(ddbound)2when d outside [d0ϵ,d0+ϵ]

Angle Constraints

For SP/linear atoms, constrain the angle to 180°:

Eangle=k(θ180°)2

BFGS Optimizer (ETKDG 3D)

The 3D force field uses a similar BFGS optimizer:

  • Maximum iterations: 300 (single pass)
  • Early termination: skip if initial energy < 105
  • No restarts — single optimization pass

Gradient Computation

Both force fields use analytical gradients for efficient optimization. Key gradient formulas:

Distance Term Gradient

For the distance between atoms i and j:

dijxi=xixjdij

Torsion Angle Gradient

The torsion angle ϕ defined by atoms (a,b,c,d):

ϕxa=|b1||n1|2n1ϕxd=|b2||n2|2n2

where b1=xbxa, b2=xcxd, n1=b1×bmid, n2=b2×bmid.

Chiral Volume Gradient

For the chiral volume V=v1(v2×v3):

Vxn1=v2×v3Vxn2=v3×v1Vxn3=v1×v2Vxcenter=(Vxn1+Vxn2+Vxn3)

BFGS Algorithm

Both force fields use the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method:

Hk+11=(IskykTykTsk)Hk1(IykskTykTsk)+skskTykTsk

where:

  • sk=xk+1xk (step)
  • yk=fk+1fk (gradient change)
  • Hk1 is the inverse Hessian approximation

The line search uses the following strategy:

  1. Initial step: from the BFGS direction
  2. Backtracking: cubic/quadratic interpolation
  3. Sufficient decrease: Armijo condition with c=104
  4. Maximum step: scaled to not exceed 100.0

3. MMFF94 Force Field

The Merck Molecular Force Field 94 (MMFF94) is used for post-embedding energy refinement and geometry optimization of organic drug-like molecules. sci-form implements the full four-term MMFF94 energy function with analytical gradients.

Atom Types

MMFF94 assigns each heavy atom a chemical context type used to look up force constants:

TypeDescription
HCHydrogen bonded to carbon
CSpsp³ carbon
CSp2sp² carbon (non-aromatic)
CBAromatic carbon
CRsp³ carbon in ring
NRsp³ nitrogen in ring
N2sp² nitrogen
NAmAmide nitrogen
NR2Aromatic nitrogen
NCNitrile nitrogen
ORsp³ oxygen
O2sp² oxygen
FFluorine
PPhosphorus
SSulfur
ClChlorine
BrBromine
IIodine

Energy Function

EMMFF94=Es+Eθ+Eϕ+EvdW

1. Bond Stretching (Quartic Form)

MMFF94 uses a quartic Taylor expansion around the equilibrium bond length r0:

Es=143.932512kbΔr2[1+csΔr+712cs2Δr2]

where:

  • Δr=rr0 is the bond length deviation (Å)
  • kb is the bond force constant (md/Å)
  • cs=2.0 Å1 is the cubic stretch constant
  • The factor 143.9325 converts md/Å to kcal/mol

The quartic form captures asymmetric bond behaviour: bonds are harder to compress than to stretch.

Analytical gradient (used in sci-form):

Esr=143.9325kbΔr[1+32csΔr+76cs2Δr2]

2. Angle Bending (Cubic Form)

Eθ=0.04384412kaΔθ2[1+cbΔθ]

where:

  • Δθ=θθ0 in degrees
  • ka is the angle force constant (md·Å/rad²)
  • cb=0.014 deg1 is the cubic bend constant
  • The factor 0.043844 converts to kcal/mol

The cubic term models the asymmetry of valence angle deformation near linearity.

Analytical gradient:

Eθθ=0.043844kaΔθ[1+32cbΔθ]180π

3. Torsion (3-Term Fourier)

Eϕ=12[V1(1+cosϕ)+V2(1cos2ϕ)+V3(1+cos3ϕ)]

where V1, V2, V3 are barrier heights (kcal/mol) specific to each atom-type quadruplet (i-j-k-l) and ϕ is the dihedral angle in radians.

The three Fourier terms model:

  • V1: one-fold periodicity (conformational asymmetry)
  • V2: two-fold periodicity (conjugation effects)
  • V3: three-fold periodicity (staggered/eclipsed preference)

The dihedral gradient uses numerical central differences (ε=105):

EϕxiEϕ(xi+ε)Eϕ(xiε)2ε

4. Buffered 14-7 van der Waals (Halgren 1996)

MMFF94 replaces the Lennard-Jones 12-6 potential with Halgren's Buffered 14-7 form, which provides a softer repulsive wall and better treatment of close contacts:

EvdW=εij(1.07RijRij+0.07Rij)7(1.12(Rij)7Rij7+0.12(Rij)72)

where:

  • Rij is the interatomic distance
  • Rij is the empirical combined van der Waals radius
  • εij is the well depth

The 1.07 and 0.07 buffer factors soften the repulsive wall relative to Lennard-Jones 12-6, and the 1.12 / 0.12 attractive factors give a smoother well minimum.

Analytical gradient (Cartesian components):

EvdWxi=εij[(rep)Ratt+rep(att)R]xixjRij

Implementation: Mmff94Builder

The Mmff94Builder constructs the force field from a parsed molecule:

rust
use sci_form::forcefield::mmff94::Mmff94Builder;

let mol = sci_form::parse("CCO")?;
let mut builder = Mmff94Builder::new(&mol);
builder.build();  // Assigns atom types and creates all energy terms

// Evaluate energy + inject gradients
let mut grad = vec![0.0; mol.atoms.len() * 3];
let energy = builder.total_energy(&coords, &mut grad);

Gradient Validation

sci-form provides a validate_gradients() utility that checks analytical gradients against central-difference numerical gradients:

rust
use sci_form::forcefield::mmff94::validate_gradients;

let max_err = validate_gradients(&term, &coords, 1e-5);
assert!(max_err < 1e-4, "gradient error: {max_err}");

This is used in the test suite to verify correctness of all four term types.

References

  1. Halgren, T. A. Merck Molecular Force Field. J. Comput. Chem. 1996, 17, 490–519.
  2. Halgren, T. A. MMFF VII. Characterization of MMFF94, MMFF94s, and other widely available force fields. J. Comput. Chem. 1999, 20, 730–748.

Released under the MIT License.