Replace the sequential per-env linesearch (_kernel_linesearch) with a
parallel linesearch pipeline using 6 specialized kernels:
- _kernel_parallel_linesearch_mv: mv = M @ search, ndrange(dof, env)
- _kernel_parallel_linesearch_jv: jv = J @ search, ndrange(constraint, env)
- _kernel_parallel_linesearch_p0: fused snorm/quad_gauss/eq_sum/p0_cost with shared memory reductions
- _kernel_parallel_linesearch_eval: K=16 log-spaced candidates evaluated in parallel with shared memory argmin
- _kernel_parallel_linesearch_apply_alpha_dofs: apply best alpha to qacc/Ma
- _kernel_parallel_linesearch_apply_alpha_constraints: apply best alpha to Jaref
Also includes decomposed update_constraint (3 kernels) in the iteration loop.
Additional changes:
- Add dofs_info to func_solve_body dispatch signature
- Add _log_scale helper function to solver.py
- Exclude requires_grad from decomposed path (parallel LS is sensitive to FP precision)
- Update test_grad.py to pass dofs_info
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No description provided.