Skip to content

Algorithm Overview

sci-form is a computational chemistry library with two main areas: 3D conformer generation and quantum-chemistry-inspired property computation. This page describes both pipelines at a high level.


Part 1: ETKDGv2 Conformer Pipeline

sci-form implements ETKDGv2 (Experimental Torsion Knowledge Distance Geometry v2) to generate 3D molecular conformers from SMILES strings.

The 9-Step Pipeline

Phase 1: Topology → Bounds (Steps 1–3)

Build a molecular graph and derive distance constraints between all atom pairs. Constraints form a bounds matrix B where lijdijuij.

  • 1-2 bounds: Bond lengths from UFF parameters
  • 1-3 bounds: From bond lengths + equilibrium angles (law of cosines)
  • 1-4 bounds: From torsion angle cis/trans extremes
  • VDW bounds: Lower bounds from van der Waals radii
  • Smoothing: Floyd-Warshall to enforce the triangle inequality

→ Details: Bounds Matrix, SMILES Parsing

Phase 2: Embedding → Optimization (Steps 4–6)

Generate 3D (or 4D) coordinates from the distance constraints using distance geometry:

  1. Pick random distances from the smoothed bounds (MinstdRand RNG)
  2. Convert distances to a metric matrix via the Cayley-Menger transform
  3. Extract coordinates via eigendecomposition (power iteration solver)
  4. Minimize distance violations using BFGS with a bounds violation force field
Tij=12(D0i+D0jdij2)

where D0i=1Nkdik21N2k<ldkl2

→ Details: Distance Geometry, Embedding

Phase 3: Refinement → Output (Steps 7–9)

After obtaining valid 3D coordinates, refine using the ETKDG force field:

  • CSD torsion preferences: 846 SMARTS patterns with Fourier coefficients
  • UFF inversions: Out-of-plane energy for SP2 centers
  • Distance constraints: Maintain bond lengths and angles
  • Validation: Reject conformers failing tetrahedral/planarity/stereo checks

→ Details: ETKDG Refinement, Force Fields, Validation

The Retry Loop

The pipeline uses a retry loop with up to 10N iterations:

  1. Metric matrix has zero or negative eigenvalues → retry
  2. Energy/atom after bounds minimization exceeds 0.05 → retry
  3. Tetrahedral centers fail volume test → retry
  4. Chiral volume signs don't match → retry
  5. Planarity check fails → retry
  6. Double-bond geometry is wrong → retry

After N/4 consecutive failures, fall back to random box placement (uniform from [5,5]3).


Part 2: Property Computation Pipeline

Once a conformer is generated, sci-form can compute a range of molecular properties.

Gasteiger Charges (no QM)

Fast empirical electronegativity equalization — requires only topology + atomic numbers, runs in microseconds.

→ Details: Population Analysis

Extended Hückel Theory (EHT)

EHT is the gateway to most quantum properties:

  1. Build STO-3G basis — contracted Gaussian orbitals for each atom
  2. Overlap matrix SSμν=ϕμ|ϕν
  3. Wolfsberg-Helmholtz HHμν=K2Sμν(Hμμ+Hνν)
  4. Löwdin orthoH~=S1/2HS1/2, diagonalize → orbital energies εi and MO coefficients
  5. Fill Ne electrons from lowest orbital up → HOMO/LUMO

From EHT:

PropertyMethodKey Output
Population analysisMulliken/LöwdinPer-atom charges
Dipole momentBond + lone-pairVector + Debye magnitude
Orbital gridsSTO-3G on 3D gridFloat32 volumetric array
Isosurface meshMarching cubesVertices, normals, triangles
DOS/PDOSGaussian smearingTotal DOS, per-atom DOS

→ Details: Density of States, Population Analysis, Dipole Moments

Electrostatic Potential

Coulomb ESP on a 3D grid from Mulliken charges:

V(r)=iqiMulliken|rri|

Color-mapped (red = negative, white = zero, blue = positive). Gaussian Cube export.

→ Details: Electrostatic Potential

Force Fields

  • UFF — Universal Force Field, 50+ element types (including transition metals)
  • MMFF94 — Merck Molecular Force Field: quartic stretch, cubic bend, 3-term Fourier torsion, Halgren 14-7 vdW

→ Details: Force Fields, Strain Energy

Molecular Alignment

Two algorithms:

  • Kabsch SVD — optimal rotation, O(Nmin(N,3)2)
  • Quaternion alignment — Coutsias 2004 4×4 eigenproblem, faster for large N

→ Details: Molecular Alignment

Materials Assembly

Node/linker SBUs + topology → periodic crystal structure (MOF-type).

→ Details: Materials Assembly


References

  • Riniker & Landrum, J. Chem. Inf. Model. 2015, 55, 2562 (ETKDGv2)
  • Wang et al., J. Chem. Inf. Model. 2020, 60, 2044 (ETKDGv3)
  • Blaney & Dixon, Rev. Comput. Chem. 1994, 5, 299 (Distance geometry)
  • Wolfsberg & Helmholtz, J. Chem. Phys. 1952, 20, 837 (EHT)
  • Mulliken, J. Chem. Phys. 1955, 23, 1833 (Population analysis)
  • Gasteiger & Marsili, Tetrahedron 1980, 36, 3219 (Charge equalization)
  • Coutsias et al., J. Comput. Chem. 2004, 25, 1849 (Quaternion alignment)
  • Halgren, J. Comput. Chem. 1996, 17, 490 (MMFF94)

Released under the MIT License.