Undine Beta · NVIDIA RTX · CUDA 12 · Windows

Live fluid iteration, inside Blender.

A CUDA-accelerated FLIP / PIC / APIC liquid solver for Blender, built to run on your NVIDIA RTX GPU. Iterate at realtime speeds on water presets where you used to wait minutes per frame, then add Whitewater FX as a separate one-way visual layer for spray, foam, and bubbles without changing the primary solve.

Needs an NVIDIA RTX (CUDA) GPU Windows-only during the beta

water fast-path
23.6FPS
vs. CPU path
~2×
test scene
~60K parts
RTX 5090
refGPU
blender · undine · streamflow viewport
Real-time preview of the Undine fluid solver inside Blender
// water fast-path · ~60k particles · GPU 49 ms/frame vs CPU 171 ms · RTX 5090

// origin

Undine is designed to iterate and reach results fast.

Blender can model, sculpt, and render with the best of them — yet liquid simulation always felt a step behind. The realtime techniques already existed elsewhere: APIC transfer, exact strain-rate viscosity, GPU multigrid pressure, sparse bricks. Blender was simply missing a solver that brought them together.

Undine grew out of that gap. The goal was never another offline baker that makes you wait minutes per frame and guess at the timing — it was a solver you can actually iterate with, where a tweak shows up on the next viewport playback instead of the next sitting.

So that idea runs through the whole design: a custom C++20 / CUDA 12 core, a Python addon that never blocks the simulation loop, and a debug panel that tells you the truth — CFL, residual, divergence — before you commit to a bake.

It is still a beta, but the direction is clear: Undine has the potential to become the definitive fluid addon for Blender.

— the Undine project

// solver core

What runs under the hood.

A solver is only as good as the numbers it lets you read. Undine surfaces all of them — every transfer, every iteration, every substep retry.

Methods

Hybrid FLIP / PIC
FLIP for detail, PIC for stability. Per-scene flip_ratio: lean PIC for honey, lean FLIP for thin water.
APIC transfer
Affine particle-to-grid. Each particle carries a 3×3 C matrix that preserves angular momentum across long advections — vortices stop dying after 20 frames.
Advanced viscosity & dense materials
A quality ladder (Fast / Balanced / PRO Lite / THICK_EXACT) over an opt-in pro_strain_rate exact-viscosity backend, plus Newtonian / Bingham / Herschel-Bulkley rheology for yield-stress and shear-thinning. THICK_EXACT is numerical, not material memory. Real viscoelastic memory (VISC-PRO) and paste forces are separate, experimental, CPU-first routes.
PCG + GPU Multigrid pressure
Preconditioned conjugate gradient with optional V-cycle multigrid (sym. RBGS smoother). FP32 tolerance floor scaled by √N avoids spinning iterations against numerical noise.
P2G density correction
Volume drift correction on GPU where supported. On the water fast-path the device path measures ~2× the FPS of the equivalent CPU path on the same scene (23.6 vs 8.3 FPS, RTX 5090).
Vorticity confinement
Fedkiw-style two-pass grid op: ω = ∇×v, normalize, restore lost rotation. Recovers small-scale curl without injecting energy.
Sparse Bricks
Active simulation regions tracked as bricks with a halo. Empty space costs nothing; the solver only allocates and moves the cells that are actually occupied.
Scale-aware resolution
Separates voxel size from domain size. Fixed Voxels Per Meter keeps physical resolution stable — small domains become cheaper, not accidentally expensive. Cinematic Scale Lock preserves visual motion across artistic domain resizing.
SDF collisions
Baked or animated SDFs, configurable boundary band, multi-collider, contact refill. CUDA-accelerated bake for animated colliders (slabs sized to dodge Windows TDR).
Anisotropic meshing
DENSITY_MC scalar route with covariance-matrix anisotropy: per-particle ellipsoids elongated along velocity. Fluid surfaces look like sheets and drips, not marbles.
Native Alembic motion blur
Meshify writes per-vertex .velocities automatically — Lagrangian, sourced from the solver's own field. Enable Motion Blur in your renderer; the blur is already physically aligned with the sim.
Substep retry chain
If pressure breaks down (max iters, NaN, FP32 plateau), the solver promotes the substep through a deterministic chain: retry → FP64 → MG → CPU. You see the path in the log, not a crash.

Measured

# water fast-path · measured, same scene, same hardware
# ~60k particles · 1.5×1.5×1.4 m · 3 cm grid · no collider

backend       CPU        GPU
ms / frame    171        49
fps           8.3        23.6
pcg iters/sub ~137       ~18
──────────────────────────────
numerics      identical (CPU ≡ GPU)
speedup       2.9×

# RTX 5090 · Ryzen 9 7950X · 64 GB · Windows 11

// streamflow

The viewport is the bake.

Streamflow turns the simulation into a live stream. As substeps complete on the GPU, particle frames flow into Blender's playback without ever leaving the device until you ask for them. You scrub, tweak, and re-run in seconds — not in the next sitting.

Pick the rhythm: Streamflow Points for a cheap live point cloud, or Mesh Every Frame for lockstep meshing using the preset visible in the Meshify tab. The runtime contract guarantees the preset you see in the panel is the preset that ran — no silent fallbacks to a faster-but-cruder profile.

  • Live particle cache, GPU-resident between frames
  • Lockstep mesh preview with final_hero / final_ultra presets
  • SNAP playback keeps the viewport in step with the solver
  • SNP5 / wire v6 publishes secondary whitewater payloads only when they are ready
  • Emission and transfer budgets decimate or skip preview work before it blocks the main solve
  • Per-frame triangle count and frame counter — so you see meshing happen
  • Cancel mid-run; cached frames stay valid
Streamflow live preview running inside the Blender viewport
// streamflow · solver and meshifier sharing one cache

// whitewater fx beta

Spray, foam, and bubbles as a safe visual layer.

Whitewater FX (Beta) is a modular system separated from the main solver. It generates secondary particles for spray, foam, and bubbles as a disposable one-way visual layer. The primary FLIP/APIC particles remain the water that participates in P2G, pressure, G2P, viscosity, the primary cache, and Meshify.

In the viewport, Whitewater runs through a realtime path parallel to Streamflow. It reads primary particles from GPU memory when available, applies emission and transfer budgets, publishes secondary payloads only when ready, and degrades by decimation or frame skip if the budget is exceeded.

For final quality, Production Bake generates versioned whitewater cache after an approved particle/Streamflow cache. The cache carries render attributes for Geometry Nodes or instancing; it does not feed forces, volume, pressure, viscosity, or the primary Meshify surface back into the simulation.

  • Spray for airborne droplets from impacts, jets, waterfalls, and fast wakes
  • Foam for surface accumulation over wakes, turbulent edges, and impacts
  • Bubbles for underwater aeration with buoyant upward motion
  • Separate versioned <run>/whitewater/frame_XXXXXX.gpfb cache with kind, age, radius, velocity, and id attributes
  • Independent Blender viewport toggles for fluid, spray, foam, and bubbles
Undine viewport preview with secondary whitewater particles
// one-way secondary particles · meshify consumes primary water only
Undine simulating thick chocolate pouring over cookies and holding its shape on the Blender grid
// dense-material route · viscoelastic chocolate keeping its ridge over cookies

// materials

From thin water to dense, opt-in materials.

Undine includes an advanced route for fluids that do not behave like water: thick honey, chocolate, toothpaste, frosting, and creams that should keep ribbons, ridges, strands, and folds for longer.

It stacks four opt-in layers — solver quality, rheology, viscoelastic memory, and collider contact — as a numerical path, not a silent replacement for old scenes.

Viscosity & VISC-PRO reference →
water

Low-viscosity liquids

FLIP-leaning, APIC transfer, light pressure iterations. Splashes, pours, droplets — surface tension on, vorticity confinement to recover lost curl.

solver

Quality ladder

Fast → Balanced → PRO Lite → THICK_EXACT. PRO Lite is a weighted implicit solve; Thick Exact selects the exact pro_strain_rate route for final bakes. Numerical quality, not material memory — and CPU/host only.

rheology

Yield-stress & shear-thinning

Newtonian, Bingham, or Herschel-Bulkley. Real yield-stress hold-shape and shear-thinning/thickening with apparent-viscosity clamps and temporal anti-flicker. Newtonian stays the default.

memory

Viscoelastic memory (VISC-PRO)

Opt-in Maxwell/Oldroyd-B memory with a smooth yield↔elastic hold-shape coupling — chocolate folds and toothpaste crests that hold. Experimental, CPU-only, OFF by default; with it off the job is byte-identical.

paste

Cohesion, adhesion, wetting

Paste forces can add controlled fluid-fluid cohesion, SDF adhesion, and simplified wetting near walls, nozzles, and free surfaces. They are clamped per substep and measured explicitly.

contact

Collider slip capture

State-based contact response: normal damping, tangential friction, and no-slip. Slip capture lets material slide on impact and grip once it settles, without gluing the fluid or killing splash.

gpu

Resident route stays protected

Viscoelastic memory and paste behavior start as CPU/host reference routes. GPU/resident support is unsupported or experimental until parity, stability, and fallback diagnostics are proven.

// performance

On the order of 2× faster than the equivalent CPU path.

Undine runs fluid simulation on the GPU (NVIDIA CUDA). On the water fast-path the solver is around 2× faster than the equivalent CPU path on the same scene and the same hardware — and the numerical health is identical, so the GPU does not trade accuracy for speed. These are measured figures, not estimates.

CPU path
171 ms / frame
8.3 fps
GPU water fast-path
49 ms / frame
23.6 fps

Water scene, ~60k particles, 1.5 × 1.5 × 1.4 m domain, 3 cm grid, no collider. RTX 5090 · Ryzen 9 7950X · 64 GB · Windows 11. Steady-state average, warm-up frames discarded.

Scope of these figures

These are GPU water fast-path numbers. Absolute FPS depends on particle count and grid resolution — larger scenes are slower in absolute terms, as in any solver, though the GPU advantage holds or grows with scale. Advanced features — PRO Exact viscosity, viscoelastic memory (experimental), and local colliders — run on CPU and are slower; see the viscosity docs.

In active development

The GPU-resident pipeline keeps being optimized. One change in progress is a fully device-resident pressure solver, which would enable CUDA-graph capture and cut per-frame latency further. Expect continued performance gains through the beta.

// diagnostics

No black box. Instability shows up before the bake.

Every substep exposes CFL, Poisson residual, mean and absolute divergence, active cell count, brick state, retry chain step, and CPU↔GPU match. If a scene is going to blow up, you see it while previewing, not four hours into the bake.

  • Live [GPU PROF] per-stage timings, exportable
  • Per-substep CFL and pressure residual in the panel
  • Retry-chain visibility: code=4 max_iters → FP64 → MG → CPU
  • Brick authority state: dense_only · velocity_pages · demoted reason
  • Override readouts: when a scene drifts away from its preset, you see which control did it
  • CPU↔GPU match for catching numerical drift between paths
undine · numerical health stable
CFL 0.42
‖∇·u‖∞ 3.4e-5
residual 1.2e-6
retries 0
brick velocity_pages · halo 1 match cpu↔gpu 1.0e-7 cells 1,204,832 active chain code=4 → fp64 → mg → cpu
// diagnostics panel · CFL, residual, divergence, brick state, retry chain

// who it's for

Three kinds of project where Undine changes the rhythm.

01

Cinematic and VFX

Iterate secondary fluid timing without blocking the main render. The Alembic ships with per-vertex .velocities: correct motion blur on the first render, no auxiliary AOVs, no Geometry Nodes scaffolding.

02

Advertising and motion design

Live client reviews. Tweak viscosity, gravity, surface tension or emission and the next playback shows it — not the next meeting. Streamflow keeps the viewport honest.

03

Indie production

Small teams that can't afford an 8h overnight bake just to find out the timing was wrong. One GPU, one license, one Blender instance — the whole pipeline fits on a workstation.

// beta

limited beta release

Start with Undine Beta $39 USD

Early access pricing for artists who want to use Undine while it is still moving fast. Includes the 0.0.x beta release and updates through 1.0 at no extra cost.

Beta requires an NVIDIA RTX (CUDA) GPU and runs on Windows only. macOS and Linux are not supported yet.

Limited-time launch window Direct 1.0 upgrade path