ICML 2026 / Neural Operators for PDEs

Hyperbolic Neural Operator

Jieyuan Pei, Zhuoxuan Li, Wei Li, Haobo Zhang, Jiawei Jiang, Jianwei Zheng

Zhejiang University of Technology / Zhejiang Key Laboratory of Visual Information Intelligent Processing / Tongji University

HNO reframes neural operator learning as learning the physical approximation structure of fast solvers: resolve near fields directly, compress far fields hierarchically.

Paper PDF ICML Page Code BibTeX

6 PDE benchmarks 2 large-scale CFD tasks 32k mesh nodes per sample

What We Learn

Not just an infinite-dimensional map, but a fast physical approximation.

Classical neural operators are often described as learning a solution operator between function spaces. HNO shifts the emphasis: for PDE surrogate modeling, the key structure is the same one used by the Fast Multipole Method and hierarchical matrices.

Nearby interactions are kept detailed; distant interactions are routed through compressed summaries. Hyperbolic geometry gives this near-far approximation a continuous, learnable coordinate system.

Near-field, far-field, hierarchical tree, and hyperbolic geometry motivation — HNO targets the same near-far approximation pattern used by FMM and hierarchical matrices: precise local interactions plus compressed far-field summaries.

Why Hyperbolic Geometry

Physical interactions are not uniformly expensive.

Near-far decomposition

Nearby interactions need fine precision, while distant regions are often compressible into lower-rank summaries.

Tree-like structure

Fast solvers exploit recursive hierarchies, but Euclidean space distorts exponentially branching structures.

Lorentz routing

HNO defines a stabilized Gibbs kernel from hyperbolic geodesic distances, giving attention an explicit scale coordinate.

Method

Attention as a continuous hyperbolic kernel.

Query and key features are lifted to the Lorentz hyperboloid. Their stabilized geodesic distance is converted into attention weights, so the model learns a near-far routing kernel rather than an unrestricted Euclidean token mixer.

Core kernel: Gibbs attention from negative Lorentz distance
Routing bias: Global aggregation at small radius, local focus at large radius
Interaction cost: Reduced from O(N^2) to O(NM + M^2) with an M-token core

Hyperbolic-distance attention block in the Lorentz model — Hyperbolic-distance attention converts Lorentz geodesic distance into routing weights.

Toy Experiment

Can attention fit a multiscale tree kernel?

The controlled toy in the main paper uses graph diffusion on a complete binary tree. The target kernel is row-stochastic, with entries decaying with tree distance, so the problem exposes exactly the hierarchical near-far structure that FMM-style approximations exploit.

Toy multiscale tree-kernel fitting comparing ground truth, hyperbolic, dot-product, and MLP attention — Main-paper toy: multiscale tree-kernel fitting. Hyperbolic distance attention tracks the block-hierarchical target kernel more faithfully as N grows.

Target

Graph diffusion on tree leaves

A signal on the leaves is diffused by a kernel whose weights decay with shortest-path distance on the tree.

Structure

Discrete interaction scales

Shared ancestors induce nested blocks: close leaves need strong local coupling, while far leaves are grouped into coarser summaries.

Result

Hyperbolic fits the hierarchy

Dot-product attention misses the tree metric, while hyperbolic distance preserves the multiscale block structure expected from near-far physical approximation.

Results

Lower relative l2 error across diverse PDE geometries.

HNO is evaluated against 19 neural operator baselines over point clouds, regular grids, and structured meshes.

Point Cloud

Elasticity

HNO error 0.0037

Second best 0.0064

Relative reduction 42.2%

Benchmark	Geometry	HNO	Second best	Reduction
Elasticity	Point Cloud	0.0037	0.0064	42.2%
Navier-Stokes	Regular Grid	0.0676	0.0892	24.2%
Darcy	Regular Grid	0.0045	0.0054	16.7%
Plasticity	Structured Mesh	0.0009	0.0012	25.0%
Airfoil	Structured Mesh	0.0048	0.0053	9.4%
Pipe	Structured Mesh	0.0027	0.0042	35.7%

Efficiency

Compact operators without giving up accuracy.

On Darcy, HNO keeps the model compact and reduces the memory and latency burden relative to transformer-style operator baselines.

Params 0.82M

vs. 2.83M+ for compared transformer baselines

VRAM 0.227 GB

vs. 2.18 GB for Transolver and 6.07 GB for Transolver++

Inference 4.47 ms/b

vs. 26.10 ms/b for Transolver on Darcy

Darcy Comparison

Accuracy, parameters, memory, and runtime

Lower is better

HNO0.004460.82M0.227 GB0.73 h4.47 ms/b

Transolver0.005672.83M2.18 GB2.76 h26.10 ms/b

Transolver++0.005602.84M6.07 GB3.12 h74.77 ms/b

RNO0.005352.83M2.87 GB5.26 h28.40 ms/b

Params0.82M vs 2.83M+

VRAM0.227 GB vs 2.18 GB+

Infer4.47 ms/b vs 26.10 ms/b+

Qualitative PDEBench error maps for HNO and Transolver++ — Representative PDEBench error maps for HNO vs. Transolver++.

Large-Scale CFD

Scaling to unstructured meshes with about 32k nodes.

HNO is also evaluated on ShapeNet Car and AirfRANS, where cleaner residual patterns and improved coefficient prediction support the same hierarchy-aware routing behavior at industrial mesh scale.

Prediction error maps on ShapeNet Car and AirfRANS — Velocity and surface-pressure error maps for large-scale CFD tasks.

Per-sample coefficient errors for ShapeNet Car and AirfRANS — Per-sample lift and drag coefficient errors; points below the diagonal favor HNO.

Mechanism

Radius becomes an operational scale coordinate.

The learned hyperbolic radius is negatively correlated with attention entropy and receptive-field span: small-radius tokens aggregate globally, while large-radius tokens specialize locally.

Lower entropy at larger radius Shorter physical attention span Consistent head-wise trends

Mechanism verification showing radius, attention entropy, and receptive field — Elasticity mechanism verification links hyperbolic radius to attention locality.

Citation

Reference

Jieyuan Pei, Zhuoxuan Li, Wei Li, Haobo Zhang, Jiawei Jiang, and Jianwei Zheng. Zhejiang University of Technology, Zhejiang Key Laboratory of Visual Information Intelligent Processing, and Tongji University. Correspondence: Jianwei Zheng.

@inproceedings{hno2026,
  title     = {Hyperbolic Neural Operator},
  author    = {Pei, Jieyuan and Li, Zhuoxuan and Li, Wei and Zhang, Haobo and Jiang, Jiawei and Zheng, Jianwei},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  series    = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  year      = {2026},
  url       = {https://icml.cc/virtual/2026/poster/65554}
}