ICML 2026 / Neural Operators for PDEs

Hyperbolic Neural Operator

Jieyuan Pei, Zhuoxuan Li, Wei Li, Haobo Zhang, Jiawei Jiang, Jianwei Zheng

Zhejiang University of Technology / Tongji University

HNO reframes neural operator learning as learning the physical approximation structure of fast solvers: resolve near fields directly, compress far fields hierarchically.

6 PDE benchmarks 2 large-scale CFD tasks 32k mesh nodes per sample

What We Learn

Not just an infinite-dimensional map, but a fast physical approximation.

Classical neural operators are often described as learning a solution operator between function spaces. HNO shifts the emphasis: for PDE surrogate modeling, the key structure is the same one used by the Fast Multipole Method and hierarchical matrices.

Nearby interactions are kept detailed; distant interactions are routed through compressed summaries. Hyperbolic geometry gives this near-far approximation a continuous, learnable coordinate system.

Near-field, far-field, hierarchical tree, and hyperbolic geometry motivation
HNO targets the same near-far approximation pattern used by FMM and hierarchical matrices: precise local interactions plus compressed far-field summaries.

Why Hyperbolic Geometry

Physical interactions are not uniformly expensive.

01

Near-far decomposition

Nearby interactions need fine precision, while distant regions are often compressible into lower-rank summaries.

02

Tree-like structure

Fast solvers exploit recursive hierarchies, but Euclidean space distorts exponentially branching structures.

03

Lorentz routing

HNO defines a stabilized Gibbs kernel from hyperbolic geodesic distances, giving attention an explicit scale coordinate.

Method

Attention as a continuous hyperbolic kernel.

Query and key features are lifted to the Lorentz hyperboloid. Their stabilized geodesic distance is converted into attention weights, so the model learns a near-far routing kernel rather than an unrestricted Euclidean token mixer.

Core kernel
Gibbs attention from negative Lorentz distance
Routing bias
Global aggregation at small radius, local focus at large radius
Interaction cost
Reduced from O(N^2) to O(NM + M^2) with an M-token core
Hyperbolic-distance attention block in the Lorentz model
Hyperbolic-distance attention converts Lorentz geodesic distance into routing weights.

Toy Experiment

Can attention fit a multiscale tree kernel?

The controlled toy in the main paper uses graph diffusion on a complete binary tree. The target kernel is row-stochastic, with entries decaying with tree distance, so the problem exposes exactly the hierarchical near-far structure that FMM-style approximations exploit.

Toy multiscale tree-kernel fitting comparing ground truth, hyperbolic, dot-product, and MLP attention
Main-paper toy: multiscale tree-kernel fitting. Hyperbolic distance attention tracks the block-hierarchical target kernel more faithfully as N grows.
Target

Graph diffusion on tree leaves

A signal on the leaves is diffused by a kernel whose weights decay with shortest-path distance on the tree.

Structure

Discrete interaction scales

Shared ancestors induce nested blocks: close leaves need strong local coupling, while far leaves are grouped into coarser summaries.

Result

Hyperbolic fits the hierarchy

Dot-product attention misses the tree metric, while hyperbolic distance preserves the multiscale block structure expected from near-far physical approximation.

Results

Lower relative l2 error across diverse PDE geometries.

HNO is evaluated against 19 neural operator baselines over point clouds, regular grids, and structured meshes.

Point Cloud

Elasticity

HNO error 0.0037
Second best 0.0064
Relative reduction 42.2%
Benchmark Geometry HNO Second best Reduction
ElasticityPoint Cloud0.00370.006442.2%
Navier-StokesRegular Grid0.06760.089224.2%
DarcyRegular Grid0.00450.005416.7%
PlasticityStructured Mesh0.00090.001225.0%
AirfoilStructured Mesh0.00480.00539.4%
PipeStructured Mesh0.00270.004235.7%

Efficiency

Compact operators without giving up accuracy.

On Darcy, HNO keeps the model compact and reduces the memory and latency burden relative to transformer-style operator baselines.

Params 0.82M

vs. 2.83M+ for compared transformer baselines

VRAM 0.227 GB

vs. 2.18 GB for Transolver and 6.07 GB for Transolver++

Inference 4.47 ms/b

vs. 26.10 ms/b for Transolver on Darcy

Darcy Comparison

Accuracy, parameters, memory, and runtime

Lower is better
MethodRel. l2ParamsVRAMTrainInfer
HNO0.004460.82M0.227 GB0.73 h4.47 ms/b
Transolver0.005672.83M2.18 GB2.76 h26.10 ms/b
Transolver++0.005602.84M6.07 GB3.12 h74.77 ms/b
RNO0.005352.83M2.87 GB5.26 h28.40 ms/b
Params0.82M vs 2.83M+
VRAM0.227 GB vs 2.18 GB+
Infer4.47 ms/b vs 26.10 ms/b+
Qualitative PDEBench error maps for HNO and Transolver++
Representative PDEBench error maps for HNO vs. Transolver++.

Large-Scale CFD

Scaling to unstructured meshes with about 32k nodes.

HNO is also evaluated on ShapeNet Car and AirfRANS, where cleaner residual patterns and improved coefficient prediction support the same hierarchy-aware routing behavior at industrial mesh scale.

Prediction error maps on ShapeNet Car and AirfRANS
Velocity and surface-pressure error maps for large-scale CFD tasks.
Lift coefficient HNO better on 75%

Median absolute error: 0.029 to 0.012

Drag coefficient HNO better on 65%

Median relative error: 0.013 to 0.009

Per-sample coefficient errors for ShapeNet Car and AirfRANS
Per-sample lift and drag coefficient errors; points below the diagonal favor HNO.

Mechanism

Radius becomes an operational scale coordinate.

The learned hyperbolic radius is negatively correlated with attention entropy and receptive-field span: small-radius tokens aggregate globally, while large-radius tokens specialize locally.

Lower entropy at larger radius Shorter physical attention span Consistent head-wise trends
Mechanism verification showing radius, attention entropy, and receptive field
Elasticity mechanism verification links hyperbolic radius to attention locality.

Citation

Reference

Jieyuan Pei, Zhuoxuan Li, Wei Li, Haobo Zhang, Jiawei Jiang, and Jianwei Zheng. Zhejiang University of Technology and Tongji University. Correspondence: Jianwei Zheng.

@inproceedings{hno2026,
  title     = {Hyperbolic Neural Operator},
  author    = {Pei, Jieyuan and Li, Zhuoxuan and Li, Wei and Zhang, Haobo and Jiang, Jiawei and Zheng, Jianwei},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  series    = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  year      = {2026},
  url       = {https://icml.cc/virtual/2026/poster/65554}
}