// undine docs

Solver Architecture

Undine is designed as a production-oriented beta tool and as a platform for continued research and experimentation in fluid simulation techniques.

This chapter gathers the architectural material that matters most to technical users, developers, and researchers.

System Overview

The Undine system is intentionally layered so that the Blender-facing experience remains separate from the simulation core.

This separation keeps the user interface flexible while preserving the numerical integrity of the solver.

Solver architecture diagram

Undine is organized as a layered system that connects Blender scene setup to a dedicated simulation core.

graph TD
    Blender["Blender Scene"]

    Addon["Undine Blender Addon"]
    UIPanels["UI Panels"]
    SimConfig["Simulation Configuration"]
    SceneExport["Scene Export"]
    SimControl["Simulation Control"]

    Interface["Simulation Interface Layer"]
    ConfigFiles["Configuration Files"]
    CommandExec["Command Execution"]
    DataExchange["Data Exchange"]

    Core["Undine Simulation Core (C++)"]
    ParticleSystem["Particle System"]
    Neighbor["Neighbor Search"]
    Pipeline["Solver Pipeline"]
    Advection["Advection"]
    Boundary["Boundary Handling"]
    Viscosity["Viscosity"]
    Pressure["Pressure Solver"]
    Redist["Particle Redistribution"]
    Output["Output System"]
    Cache["Particle Cache"]
    MeshGen["Mesh Generation"]

    Blender --> Addon
    Addon --> UIPanels
    Addon --> SimConfig
    Addon --> SceneExport
    Addon --> SimControl
    Addon --> Interface
    Interface --> ConfigFiles
    Interface --> CommandExec
    Interface --> DataExchange
    Interface --> Core
    Core --> ParticleSystem
    Core --> Neighbor
    Core --> Pipeline
    Pipeline --> Advection
    Pipeline --> Boundary
    Pipeline --> Viscosity
    Pipeline --> Pressure
    Pipeline --> Redist
    Core --> Output
    Output --> Cache
    Output --> MeshGen
Layered system view
Blender Scene
|
+-- Undine Blender Addon
|   +-- UI Panels
|   +-- Simulation Configuration
|   +-- Scene Export
|   +-- Simulation Control
|
+-- Simulation Interface Layer
|   +-- Configuration Files
|   +-- Command Execution
|   +-- Data Exchange
|
+-- Undine Simulation Core (C++)
    +-- Particle System
    +-- Neighbor Search
    +-- Solver Pipeline
    |   +-- Advection
    |   +-- Boundary Handling
    |   +-- Viscosity
    |   +-- Pressure Solver
    |   +-- Particle Redistribution
    |
    +-- Output System
        +-- Particle Cache
        +-- Mesh Generation

Philosophy of the Solver

Fluid simulation systems are extremely sensitive to numerical consistency.

Even small mismatches between solver stages can produce visible artifacts, and those issues are often caused by subtle inconsistencies between multiple subsystems rather than by a single isolated algorithm.

For this reason, Undine approaches fluid simulation as a coherent numerical system rather than a collection of isolated features.

Architectural consequences

This philosophy influences the modular stage design, the strict separation between simulation logic and user interface, and the careful validation of numerical changes.

Consistency Across Execution Paths

In many fluid systems, CPU and GPU execution paths can behave slightly differently.

Perfect numerical equivalence is rarely possible in practice, but maintaining comparable solver behavior remains an important design objective because it improves predictability and reproducibility across systems and configurations.

Extensibility and Long-Term Evolution

Fluid simulation research continues to evolve rapidly. New numerical methods, acceleration techniques, and meshing algorithms are regularly developed by the simulation and computer graphics communities.

Because the simulation pipeline is modular, individual solver stages can be replaced or upgraded as new approaches become available.

This design allows Undine to grow over time while maintaining a stable foundation for existing workflows.

Orchestration layer

The orchestration layer sits between the solver core and the addon. It owns the frame loop, the substep loop, route resolution, retry coordination, and stage scheduling — the parts of the system that decide which kernel runs, in what order, on which backend, with which fallback.

Keeping orchestration separate from algorithms is what allows the retry chain (PCG → FP64 → MG → CPU) to exist without leaking into individual stages. Each stage just does its job; the orchestration layer decides whether to escalate.

Pipeline stages

  • Emission — particle creation from emitters, with temporal interpolation across substeps
  • Advection / Forces — gravity, drag, surface tension contribution
  • P2G — particle-to-grid transfer with APIC where enabled
  • Advanced viscosity — optional pro_strain_rate exact-viscosity backend for dense materials
  • Viscoelastic memory — optional CPU-only Maxwell/Oldroyd-B stress transport with yield↔elastic coupling, before pressure when enabled
  • Paste forces — optional CPU-reference cohesion, adhesion, and simplified wetting before pressure when enabled
  • Pressure — projection (PCG, optional V-cycle multigrid, density correction)
  • Viscosity — when enabled (Simple quality preset / Implicit / PRO Lite / PRO Exact), with optional Newtonian/Bingham/Herschel-Bulkley rheology
  • G2P — grid-to-particle transfer with FLIP/PIC blend
  • Collisions — SDF lookup, normal damping, tangential friction, no-slip, slip capture, contact refill
  • Surface tension — when enabled, on the free-surface band
  • Particle redistribution — reseeding, separation, mass-aware coverage
  • Streamflow publish — frame committed to the shared cache
  • Output write — disk persistence of expensive points and (optionally) mesh

Bricks subsystem

The bricks subsystem is the sparse-grid layer. It tracks active simulation regions as bricks with a halo wide enough for the pressure stencil, validates halo sufficiency / G2P coverage / extrapolation coverage, and decides authority transitions (dense ↔ velocity-pages) based on observed coverage and topology stability.

Authority demotions are logged with reasons (ParticleCore, ParticleHalo, TemporalEmission, ColliderBand, PressureBand, ExtrapolationBand, G2PSampling, PredictedMotion, RetainedTtl). When a sim is demoting often, the reason mask points at the cause.

Advanced viscosity and dense materials

This branch is for dense materials that should retain form longer than water-like liquids: thick honey, chocolate, toothpaste, frosting, creams, and paste-like masses.

It should be described as a system for advanced viscosity and dense materials, not as a full MPM solver, final EVP/plasticity model, or guaranteed production chocolate/frosting simulator.

PRO Exact / THICK_EXACT

pro_strain_rate is exposed as the advanced exact-viscosity backend through numerics.viscosity_mode = "pro_strain_rate". It is host CPU only.

THICK_EXACT is an opt-in preset that configures a stronger numerical foundation for thick materials. It is not viscoelasticity and not material memory by itself; real memory belongs to later layers.

Rheology layer

Between the solver and any memory sits the rheology model: Newtonian (default), Bingham, or Herschel-Bulkley. It turns a scalar viscosity into a material with a real yield-stress and shear-thinning/thickening response.

The non-Newtonian solve is a Picard outer iteration over an apparent viscosity built from the local strain rate, regularized by a gamma-dot epsilon, bounded by nu_min/nu_max clamps, and stabilized across substeps by an optional temporal blend that suppresses yielded/unyielded flicker.

Minimal viscoelastic memory (VISC-PRO)

The material-memory layer is CPU-first and owns its constitutive state. The extra tensor is stored as six SoA components in a lateral ViscoelasticState, avoiding changes to Core, Grid, or Field ownership.

The stress is stored as tauK, a kinematic stress in world units, so div(tauK) produces acceleration directly in world/s^2. The model is regularized Maxwell/Oldroyd-B with tensor transport, CPU strain-rate calculation, stress divergence, explicit coupling, and rollback on non-finite values or validation failures. A regularized yield↔elastic coupling (Papanastasiou/tanh, no hard threshold) gives hold-shape below the yield without substep-sensitive jitter.

It is OFF by default and OFF bitwise — with memory disabled the exported job is byte-identical. The GPU kernel does not exist yet, so under FULL Resident strict the route blocks cleanly unless the user explicitly enables a CPU host bridge, which then declares the resident contract broken by opt-in with an explicit route_reason. The upper-convected term has an operator but is not wired (grad_v=nullptr), so it produces no effect yet.

Paste forces and solid contact

PasteForcesStage sits above viscosity and memory. It adds opt-in fluid-fluid cohesion, adhesion to solid SDFs, and simplified wetting/contact-angle behavior in controlled free-surface or wall bands.

Each contribution has hard delta-v clamps per substep and its own metrics. The purpose is to help ribbons, strands, crests, and extruded material stay continuous near geometry, not to create geometric detail that the particle distribution or SDF surface has already lost.

Release hardening and GPU honesty

ProStrainRate is hardened first as a CPU reference path: real strict mode, transactional rollback, explicit effective preconditioner selection, parseable fallback reason strings, and no silent degradation.

BlockJacobi and Jacobi are the publishable ProStrainRate routes. Multigrid should not be advertised as functional for ProStrainRate if it internally degrades. GPU/resident ProStrainRate remains unsupported in the base release or experimental behind a flag until parity is proven.

GPU multigrid pressure

The PCG path optionally uses a V-cycle multigrid as preconditioner. The smoother is selectable: Jacobi (default, simple, cache-friendly) or symmetric red-black Gauss-Seidel (sym_rbgs) — single-sweep RBGS is intentionally not exposed because it breaks the SPD invariant PCG requires.

MG levels, V-cycles, smoother iterations, and a 'safe retry' policy with target-levels cap are tunable. The retry path uses weighted Jacobi at the coarse level when MG converges poorly on adversarial scenes.

FP32 reductions in the inner PCG loop are computed via warp-shuffle two-stage reductions (no shared-memory bank conflicts), with FP64 accumulation for determinism within a block.

This pressure capability should not be read as automatic multigrid support for every material route. ProStrainRate and dense-material branches only advertise the preconditioners they actually run without silent fallback.

Numerical safety net

Undine treats numerical failure as a predictable event, not an edge case. Three mechanisms protect long bakes:

FP32 tolerance floor

Below ~√N · ε_f32, the FP32 PCG residual is fighting roundoff, not the actual problem. Requested tolerances are clamped from below by a configurable safety factor (default 4.0). The floor is logged.

True-residual refresh

The CPU PCG path periodically recomputes b - A·x to reset the recurrence residual against drift. This kills the 'silent non-convergence' on long, ill-conditioned solves.

Retry chain

On breakdown (max iters, NaN, FP32 plateau): retry → FP64 → multigrid V-cycle → CPU. Each step is logged with code, iters, rel_res, and chosen backend.