WebGPU Validation Harness
This harness is the browser-side validation layer for the WebGPU-accelerated WASM paths. It is designed to answer two questions with the same report format:
- Does
gpuorhybridstay numerically aligned with the CPU baseline? - Where does WebGPU produce real runtime wins over CPU-only execution?
Scope
The initial harness covers the browser-exposed paths that already implement cpu, gpu, and hybrid execution modes behind the WASM bindings built with parallel experimental-gpu:
- ESP grid
- EHT orbital grid
- EHT density grid
- EHT orbital mesh
The page is structured so additional WebGPU surfaces can be added later without changing the JSON report shape.
Runner layout
- Browser harness page:
crates/wasm/examples/vite-webgpu-hybrid/benchmark.html - Harness logic:
crates/wasm/examples/vite-webgpu-hybrid/src/benchmark.js - Shared cases:
crates/wasm/examples/vite-webgpu-hybrid/src/benchmarkCases.js
How to run
Build the WASM package with WebGPU support and start the Vite example:
npm install
npm run webgpu:benchmark:devEquivalent example-local commands remain available inside crates/wasm/examples/vite-webgpu-hybrid.
Automated browser run
The Playwright runner launches Chromium with WebGPU flags, opens the harness page, waits for window.__SCI_FORM_WEBGPU_REPORT__, and writes the JSON artifact to disk.
npm run webgpu:test -- --report-out build/webgpu/baseline.jsonBy default the runner uses the lighter smoke profile. Use --profile full when you want the full browser workload.
Useful flags:
--iterations 5for more stable timings--profile fullto include the full browser workload instead of the smoke profile--case benzeneto isolate one fixture--allow-skipsto permit CPU-only environments to keep reporting without failing on skipped GPU rows--allow-no-webgputo preserve the artifact even when WebGPU initialization fails
Default output path:
build/webgpu/latest-report.json
Then open the benchmark page. Useful query parameters:
?iterations=5to increase timing stability?case=benzeneto isolate one fixture
Report contract
The page stores the full report in window.__SCI_FORM_WEBGPU_REPORT__ and can download it as JSON. Each comparison row records:
- molecule case
- operation name
- requested mode (
gpuorhybrid) - resolved backend string from WASM
- CPU average time
- mode average time
- speedup versus CPU
- maximum absolute deviation
- RMS deviation
- runtime notes and fallback details
Validation policy
Each operation uses CPU as the baseline and enforces operation-specific tolerances over the returned numeric payload:
- ESP grid: max absolute difference
- EHT orbital grid: max absolute difference
- EHT density grid: max absolute difference
- EHT orbital mesh: max absolute difference over the backing grid plus triangle-count parity
If WebGPU is unavailable, the harness marks GPU comparisons as skipped instead of treating the browser as a failure.
Why this design
The harness is intentionally browser-first and report-driven:
- it exercises the real WASM + browser WebGPU stack rather than only native
wgpu - it produces a stable JSON artifact that a future Playwright or CI runner can scrape without parsing DOM text
- it keeps correctness and timing in the same run so regressions in speed or accuracy show up together
Next integration step
The intended automation path is to add a browser runner that launches Chromium with WebGPU enabled, opens benchmark.html, waits for window.__SCI_FORM_WEBGPU_REPORT__, and persists the JSON artifact for comparison across CPU/GPU machines.
That runner now lives in tests/integration/test_webgpu_harness.js.
Historical comparison
The second layer compares a new report against a previously accepted baseline:
npm run webgpu:compare -- \
--baseline build/webgpu/baseline.json \
--candidate build/webgpu/latest-report.jsonPolicy file:
tests/fixtures/webgpu_regression_policy.json
What it checks:
- a comparison that previously passed should not regress to
failorskip - numerical drift (
maxAbsDiff,rmsDiff) must stay within policy multipliers and floors - GPU speedup retention is checked against the previous baseline
- optional absolute mode-time regression checks can be enabled in the policy for same-machine benchmarking
Recommended workflow:
- Generate a trusted baseline report on the target machine.
- Store that baseline outside ignored artifact directories if you want repo-tracked history.
- Compare each new run against that baseline with
npm run webgpu:compare.