Skip to content

ETKDG Refinement

The Experimental Torsion Knowledge Distance Geometry (ETKDG) refinement is what distinguishes this algorithm from plain distance geometry. It uses experimentally observed torsion angle preferences from the Cambridge Structural Database (CSD) to guide conformer geometry toward chemically realistic conformations.

The Key Insight

Plain distance geometry produces geometrically valid structures, but the torsion angles around rotatable bonds are essentially random. The ETKDG approach adds a torsion preference force field that biases the conformer toward experimentally observed dihedral angles.

CSD Torsion Pattern Library

sci-form includes 837 SMARTS patterns with associated Fourier coefficients, derived from the Cambridge Structural Database analysis by Guba et al.

Pattern Categories

CategoryCountDescription
v2 patterns365General torsion patterns
Macrocycle patterns472Patterns specific to macrocyclic systems

Fourier Representation

Each pattern encodes the preferred torsion angle distribution as a 6-term Fourier series:

V(ϕ)=k=16Vk(1+skcos(kϕ))

where:

  • Vk is the amplitude for the k-th Fourier component
  • sk{1,+1} is the sign
  • ϕ is the dihedral angle

The coefficients Vk are derived by fitting to the observed torsion angle histograms from crystallographic data.

Pattern Matching Priority

For each rotatable bond, patterns are matched in order:

  1. CSD patterns — first-match-wins among the 837 SMARTS
  2. Basic knowledge — fallback rules for common chemical environments

TIP

Pattern matching uses a first-match-wins strategy. More specific patterns are listed before general ones, so a pattern for "amide C-N" will match before a generic "any C-N" pattern.

Basic Knowledge Torsion Rules

When no CSD pattern matches a rotatable bond, these rules provide reasonable defaults:

EnvironmentRuleVk
Ring bond (4-member)FlatV2=100.0
Ring bond (5-member)FlatV2=100.0
Ring bond (6-member)FlatV2=100.0
Double bondPlanarV2=100.0
Amide C-NPlanar preferenceV2=7.0
Ester C-OPlanar preferenceV2=7.0
Aromatic-XSemi-planarV2=5.0
SP3-SP3StaggeredV3=7.0
Ether/AmineSoft staggeredV3=2.5
BiarylSemi-planarV2=5.0

ETKDG 3D Force Field Components

The complete ETKDG 3D force field combines torsion preferences with structural constraints:

1. Torsion Contributions (from CSD or basic knowledge)

For each matched torsion:

Etors=V(ϕ)=k=16Vk(1+skcos(kϕ))

Computed via Chebyshev recurrence for efficiency — only one cos evaluation per torsion angle.

2. UFF Inversions (Out-of-Plane)

For SP2 centers with 3 heavy neighbors:

Einv=K(1sinY)

where Y is the Wilson out-of-plane angle. The energy is zero when the atom is perfectly planar (Y=90°) and increases as it deviates.

Three permutations are evaluated per improper center, cycling through the neighbor triple:

Einvtotal=Kbase103p=13(1sinYp)

3. Distance Constraints

Maintain bond lengths and angles via flat-bottom potentials:

Edist=k2max(0,|dd0|ϵ)2
Bond typekϵ
1-2 (bonds)1000.01 Å
1-3 (improper)100varies
Long-range10varies

4. Linear Angle Constraints

For SP atoms (triple bonds, allenes), maintain 180° angle:

Eangle=k(θ180°)2

Optimization

The ETKDG 3D force field is minimized with a single BFGS pass:

  • Maximum iterations: 300
  • No restarts — this is a refinement step, not a global optimization
  • Early skip: if initial energy < 105, skip entirely
  • Result: final 3D coordinates ready for validation

Example: Butane Torsion

For butane (CCCC), the central C-C bond matches a CSD pattern with staggered preference (V3=7.0):

The torsion force field naturally drives the dihedral toward the anti (180°) or gauche (±60°) conformations, which match experimental observation.

Released under the MIT License.