[MISC] Speedup tiled hessian kernel by using direct lower-triangle indexing. by hughperkins · Pull Request #2618 · Genesis-Embodied-AI/Genesis

hughperkins · 2026-03-28T20:42:12Z

For diagonal tiles, iterate over exactly n_lower_tri elements using linear_to_lower_tri instead of n_dofs^2 with an upper-triangle skip. For n_dofs=62 this halves the outer loop iterations (1953 vs 3844) and eliminates warp divergence from the i_d1 >= i_d2 branch.

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.

I have added tests to cover my changes.
All new and existing tests passed.

hughperkins · 2026-03-28T21:40:37Z

Here are the results for this change:

duburcqa · 2026-03-28T21:54:21Z

-                    pid = tid
-                    numel = n_dofs_tile_row * n_dofs_tile_col
-                    while pid < numel:
-                        i_d1_ = pid // n_dofs_tile_col
-                        i_d2_ = pid % n_dofs_tile_col
-                        i_d1 = i_d1_ + i_d1_start
-                        i_d2 = i_d2_ + i_d2_start
-                        if i_d1 >= i_d2:


For the story, doing this was necessary because of a bug on Apple Metal GPU which I think has been fixed since then.

Could you run all the unit tests and all benchmarks on Apple Metal just to be sure?

like:

pytest -sv tests/test_rigid_physics.py --backend gpu

?

Tests passed:

trying now

pytest -sv -m benchmarks tests/test_rigid_benchmarks.py --backend gpu

Tests failing only on main, or only on the branch:

Only on main (1): • test_kinematic_contact_probe_box_support[0] — f64 not supported Only on branch (2): • test_lidar_cache_offset_parallel_env — f64 not supported • test_elastomer_displacement_sensor_box_sphere[2] — f64 not supported

The failure messages for those t wo above:

FAILED tests/test_sensors.py::test_lidar_cache_offset_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.

(The full list of failures for branch :

FAILED tests/test_grad.py::test_differentiable_rigid[gpu] - genesis.GenesisException: Nan grad in qpos or dofs_vel found at step 95 FAILED tests/test_recorders.py::test_plotter - genesis.GenesisException: PyAV is not installed. Please install it with `pip install av`. FAILED tests/test_recorders.py::test_file_writers - AssertionError: assert '1' in ('False', '0') FAILED tests/test_recorders.py::test_video_writer - genesis.GenesisException: PyAV is not installed. Please install it with `pip install av`. FAILED tests/test_sensors.py::test_raycaster_hits[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_sphere_ground[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_raycaster_hits[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_utils.py::test_geom_numpy_vs_torch_consistency[()] - UserWarning: The operator 'aten::linalg_svd' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:15.) FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_sphere_ground[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_utils.py::test_geom_numpy_vs_torch_consistency[(10, 40, 25)] - UserWarning: An output with one or more elements was resized since it had shape [10, 40, 25, 3, 3], which does not match the required output shape [10000, 3, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:38.) FAILED tests/test_sensors.py::test_contact_sensors_gravity_force[2] - AssertionError: ContactSensor for floor should not detect any contact yet. assert not tensor(True, device='mps:0') + where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x152753060>() + where <built-in method any of Tensor object at 0x152753060> = tensor([[1],\n [0]], device='mps:0', dtype=torch.int32).any + where tensor([[1],\n [0]], device='mps:0', dtype=torch.int32) = read() + where read = \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 <gs.engine.sensors.contact_force.ContactSensor> \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n'is_built': <bool>: True\n.read FAILED tests/test_sensors.py::test_lidar_bvh_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_add_and_read_all_registered_sensors - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. FAILED tests/test_sensors.py::test_temperature_grid_sensor_contact_and_reset[2] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_temperature_grid_simulate_all_link_temps[2] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. FAILED tests/test_sensors.py::test_temperature_grid_simulate_all_link_temps[0] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. FAILED tests/test_sensors.py::test_lidar_cache_offset_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported. FAILED tests/test_sensors.py::test_contact_sensors_gravity_force[0] - AssertionError: ContactSensor for floor should not detect any contact yet. assert not tensor(True, device='mps:0') + where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x158336d90>() + where <built-in method any of Tensor object at 0x158336d90> = tensor([1], device='mps:0', dtype=torch.int32).any + where tensor([1], device='mps:0', dtype=torch.int32) = read() + where read = \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 <gs.engine.sensors.contact_force.ContactSensor> \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n'is_built': <bool>: True\n.read ================================================= 20 failed, 472 passed, 71 skipped, 1 xfailed, 1687 warnings in 883.56s (0:14:43) ===

)

rebased. when you say reinstall you mean [...]

No, I mean uv pip install -e '.[dev]'

I see. You want me to rerun either or both of main and/or branch? (45 minutes each...)

github-actions · 2026-03-29T01:16:20Z

⚠️ Abnormal Benchmark Result Detected ➡️ Report

For diagonal tiles, iterate over exactly n_lower_tri elements using linear_to_lower_tri instead of n_dofs^2 with an upper-triangle skip. For n_dofs=62 this halves the outer loop iterations (1953 vs 3844) and eliminates warp divergence from the i_d1 >= i_d2 branch.

hughperkins · 2026-03-30T20:41:36Z

🙌

github-actions · 2026-03-30T22:22:50Z

⚠️ Abnormal Benchmark Result Detected ➡️ Report

duburcqa reviewed Mar 28, 2026

View reviewed changes

hughperkins force-pushed the hp/hess-half1 branch from 36dbd4f to 24522f0 Compare March 30, 2026 18:40

hughperkins marked this pull request as ready for review March 30, 2026 19:06

hughperkins requested a review from YilingQiao as a code owner March 30, 2026 19:06

duburcqa changed the title ~~[MISC] Use direct lower-triangle indexing in hessian tiled kernel.~~ [MISC] Speedup tiled hessian kernel by using direct lower-triangle indexing. Mar 30, 2026

duburcqa merged commit dbd7cbf into Genesis-Embodied-AI:main Mar 30, 2026
22 checks passed

hughperkins deleted the hp/hess-half1 branch March 30, 2026 20:41

Conversation

hughperkins commented Mar 28, 2026

Description

Related Issue

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

Uh oh!

hughperkins commented Mar 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

Uh oh!

hughperkins commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants