[MISC] Speedup tiled hessian kernel by using direct lower-triangle indexing.#2618
Conversation
| pid = tid | ||
| numel = n_dofs_tile_row * n_dofs_tile_col | ||
| while pid < numel: | ||
| i_d1_ = pid // n_dofs_tile_col | ||
| i_d2_ = pid % n_dofs_tile_col | ||
| i_d1 = i_d1_ + i_d1_start | ||
| i_d2 = i_d2_ + i_d2_start | ||
| if i_d1 >= i_d2: |
There was a problem hiding this comment.
For the story, doing this was necessary because of a bug on Apple Metal GPU which I think has been fixed since then.
There was a problem hiding this comment.
Could you run all the unit tests and all benchmarks on Apple Metal just to be sure?
There was a problem hiding this comment.
like:
pytest -sv tests/test_rigid_physics.py --backend gpu
?
There was a problem hiding this comment.
trying now
pytest -sv -m benchmarks tests/test_rigid_benchmarks.py --backend gpu
There was a problem hiding this comment.
Tests failing only on main, or only on the branch:
Only on main (1):
• test_kinematic_contact_probe_box_support[0] — f64 not supported
Only on branch (2):
• test_lidar_cache_offset_parallel_env — f64 not supported
• test_elastomer_displacement_sensor_box_sphere[2] — f64 not supported
There was a problem hiding this comment.
The failure messages for those t wo above:
FAILED tests/test_sensors.py::test_lidar_cache_offset_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
There was a problem hiding this comment.
(The full list of failures for branch :
FAILED tests/test_grad.py::test_differentiable_rigid[gpu] - genesis.GenesisException: Nan grad in qpos or dofs_vel found at step 95
FAILED tests/test_recorders.py::test_plotter - genesis.GenesisException: PyAV is not installed. Please install it with `pip install av`.
FAILED tests/test_recorders.py::test_file_writers - AssertionError: assert '1' in ('False', '0')
FAILED tests/test_recorders.py::test_video_writer - genesis.GenesisException: PyAV is not installed. Please install it with `pip install av`.
FAILED tests/test_sensors.py::test_raycaster_hits[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_sphere_ground[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_raycaster_hits[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_utils.py::test_geom_numpy_vs_torch_consistency[()] - UserWarning: The operator 'aten::linalg_svd' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:15.)
FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_sphere_ground[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_utils.py::test_geom_numpy_vs_torch_consistency[(10, 40, 25)] - UserWarning: An output with one or more elements was resized since it had shape [10, 40, 25, 3, 3], which does not match the required output shape [10000, 3, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:38.)
FAILED tests/test_sensors.py::test_contact_sensors_gravity_force[2] - AssertionError: ContactSensor for floor should not detect any contact yet.
assert not tensor(True, device='mps:0')
+ where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x152753060>()
+ where <built-in method any of Tensor object at 0x152753060> = tensor([[1],\n [0]], device='mps:0', dtype=torch.int32).any
+ where tensor([[1],\n [0]], device='mps:0', dtype=torch.int32) = read()
+ where read = \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 <gs.engine.sensors.contact_force.ContactSensor> \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n'is_built': <bool>: True\n.read
FAILED tests/test_sensors.py::test_lidar_bvh_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_add_and_read_all_registered_sensors - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/test_sensors.py::test_temperature_grid_sensor_contact_and_reset[2] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[0] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_temperature_grid_simulate_all_link_temps[2] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/test_sensors.py::test_temperature_grid_simulate_all_link_temps[0] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED tests/test_sensors.py::test_lidar_cache_offset_parallel_env - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_elastomer_displacement_sensor_box_sphere[2] - RuntimeError: [spirv_ir_builder.cpp:get_primitive_type@307] Type f64 not supported.
FAILED tests/test_sensors.py::test_contact_sensors_gravity_force[0] - AssertionError: ContactSensor for floor should not detect any contact yet.
assert not tensor(True, device='mps:0')
+ where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x158336d90>()
+ where <built-in method any of Tensor object at 0x158336d90> = tensor([1], device='mps:0', dtype=torch.int32).any
+ where tensor([1], device='mps:0', dtype=torch.int32) = read()
+ where read = \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 <gs.engine.sensors.contact_force.ContactSensor> \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n'is_built': <bool>: True\n.read
================================================= 20 failed, 472 passed, 71 skipped, 1 xfailed, 1687 warnings in 883.56s (0:14:43) ===
)
There was a problem hiding this comment.
rebased. when you say reinstall you mean [...]
No, I mean uv pip install -e '.[dev]'
There was a problem hiding this comment.
I see. You want me to rerun either or both of main and/or branch? (45 minutes each...)
|
|
For diagonal tiles, iterate over exactly n_lower_tri elements using linear_to_lower_tri instead of n_dofs^2 with an upper-triangle skip. For n_dofs=62 this halves the outer loop iterations (1953 vs 3844) and eliminates warp divergence from the i_d1 >= i_d2 branch.
36dbd4f to
24522f0
Compare
|
🙌 |
|
|


For diagonal tiles, iterate over exactly n_lower_tri elements using linear_to_lower_tri instead of n_dofs^2 with an upper-triangle skip. For n_dofs=62 this halves the outer loop iterations (1953 vs 3844) and eliminates warp divergence from the i_d1 >= i_d2 branch.
Description
Related Issue
Resolves Genesis-Embodied-AI/Genesis#
Motivation and Context
How Has This Been / Can This Be Tested?
Screenshots (if appropriate):
Checklist:
Submitting Code Changessection of CONTRIBUTING document.