Algorithm Overview
sci-form is a computational chemistry library with two main areas: 3D conformer generation and quantum-chemistry-inspired property computation. This page describes both pipelines at a high level.
Part 1: ETKDGv2 Conformer Pipeline
sci-form implements ETKDGv2 (Experimental Torsion Knowledge Distance Geometry v2) to generate 3D molecular conformers from SMILES strings.
The 9-Step Pipeline
Phase 1: Topology → Bounds (Steps 1–3)
Build a molecular graph and derive distance constraints between all atom pairs. Constraints form a bounds matrix
- 1-2 bounds: Bond lengths from UFF parameters
- 1-3 bounds: From bond lengths + equilibrium angles (law of cosines)
- 1-4 bounds: From torsion angle cis/trans extremes
- VDW bounds: Lower bounds from van der Waals radii
- Smoothing: Floyd-Warshall to enforce the triangle inequality
→ Details: Bounds Matrix, SMILES Parsing
Phase 2: Embedding → Optimization (Steps 4–6)
Generate 3D (or 4D) coordinates from the distance constraints using distance geometry:
- Pick random distances from the smoothed bounds (MinstdRand RNG)
- Convert distances to a metric matrix via the Cayley-Menger transform
- Extract coordinates via eigendecomposition (power iteration solver)
- Minimize distance violations using BFGS with a bounds violation force field
where
→ Details: Distance Geometry, Embedding
Phase 3: Refinement → Output (Steps 7–9)
After obtaining valid 3D coordinates, refine using the ETKDG force field:
- CSD torsion preferences: 846 SMARTS patterns with Fourier coefficients
- UFF inversions: Out-of-plane energy for SP2 centers
- Distance constraints: Maintain bond lengths and angles
- Validation: Reject conformers failing tetrahedral/planarity/stereo checks
→ Details: ETKDG Refinement, Force Fields, Validation
The Retry Loop
The pipeline uses a retry loop with up to
- Metric matrix has zero or negative eigenvalues → retry
- Energy/atom after bounds minimization exceeds 0.05 → retry
- Tetrahedral centers fail volume test → retry
- Chiral volume signs don't match → retry
- Planarity check fails → retry
- Double-bond geometry is wrong → retry
After
Part 2: Property Computation Pipeline
Once a conformer is generated, sci-form can compute a range of molecular properties.
Gasteiger Charges (no QM)
Fast empirical electronegativity equalization — requires only topology + atomic numbers, runs in microseconds.
→ Details: Population Analysis
Extended Hückel Theory (EHT)
EHT is the gateway to most quantum properties:
- Build STO-3G basis — contracted Gaussian orbitals for each atom
- Overlap matrix S —
- Wolfsberg-Helmholtz H —
- Löwdin ortho —
, diagonalize → orbital energies and MO coefficients - Fill
electrons from lowest orbital up → HOMO/LUMO
From EHT:
| Property | Method | Key Output |
|---|---|---|
| Population analysis | Mulliken/Löwdin | Per-atom charges |
| Dipole moment | Bond + lone-pair | Vector + Debye magnitude |
| Orbital grids | STO-3G on 3D grid | Float32 volumetric array |
| Isosurface mesh | Marching cubes | Vertices, normals, triangles |
| DOS/PDOS | Gaussian smearing | Total DOS, per-atom DOS |
→ Details: Density of States, Population Analysis, Dipole Moments
Electrostatic Potential
Coulomb ESP on a 3D grid from Mulliken charges:
Color-mapped (red = negative, white = zero, blue = positive). Gaussian Cube export.
→ Details: Electrostatic Potential
Force Fields
- UFF — Universal Force Field, 50+ element types (including transition metals)
- MMFF94 — Merck Molecular Force Field: quartic stretch, cubic bend, 3-term Fourier torsion, Halgren 14-7 vdW
→ Details: Force Fields, Strain Energy
Molecular Alignment
Two algorithms:
- Kabsch SVD — optimal rotation,
- Quaternion alignment — Coutsias 2004 4×4 eigenproblem, faster for large
→ Details: Molecular Alignment
Materials Assembly
Node/linker SBUs + topology → periodic crystal structure (MOF-type).
→ Details: Materials Assembly
References
- Riniker & Landrum, J. Chem. Inf. Model. 2015, 55, 2562 (ETKDGv2)
- Wang et al., J. Chem. Inf. Model. 2020, 60, 2044 (ETKDGv3)
- Blaney & Dixon, Rev. Comput. Chem. 1994, 5, 299 (Distance geometry)
- Wolfsberg & Helmholtz, J. Chem. Phys. 1952, 20, 837 (EHT)
- Mulliken, J. Chem. Phys. 1955, 23, 1833 (Population analysis)
- Gasteiger & Marsili, Tetrahedron 1980, 36, 3219 (Charge equalization)
- Coutsias et al., J. Comput. Chem. 2004, 25, 1849 (Quaternion alignment)
- Halgren, J. Comput. Chem. 1996, 17, 490 (MMFF94)