-
Notifications
You must be signed in to change notification settings - Fork 245
feat(pathfinder): add CTK root canary probe for non-standard-path libs #1595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
1 similar comment
|
/ok to test |
Libraries like nvvm whose shared object lives in a subdirectory (/nvvm/lib64/) that is not on the system linker path cannot be found via bare dlopen on system CTK installs without CUDA_HOME. Add a "canary probe" search step: when direct system search fails, system-load a well-known CTK lib that IS on the linker path (cudart), derive the CTK installation root from its resolved path, and look for the target lib relative to that root via the existing anchor-point logic. The mechanism is generic -- any future lib with a non-standard path just needs its entry in _find_lib_dir_using_anchor_point. The canary probe is intentionally placed after CUDA_HOME in the search cascade to preserve backward compatibility: users who have CUDA_HOME set expect it to be authoritative, and existing code relying on that ordering should not silently change behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
e4066be to
44c0abd
Compare
|
/ok to test |
Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test |
|
/ok to test |
|
|
||
|
|
||
| def test_derive_ctk_root_windows_ctk13(): | ||
| path = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin\x64\cudart64_13.dll" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works cross-platform due to explicit use of ntpath in _derive_ctk_root_windows. Given that the code won't look much different using the platform specific version versus not, it seems somewhat useful to have these around instead of having to skip a bunch of tests based on platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a CTK root canary probe feature to the pathfinder library to resolve libraries that live in non-standard subdirectories (like libnvvm.so under $CTK_ROOT/nvvm/lib64/). The canary probe discovers the CUDA Toolkit installation root by loading a well-known library (cudart) that IS on the system linker path, deriving the CTK root from its resolved path, and then searching for the target library relative to that root.
Changes:
- Adds canary probe mechanism as a last-resort fallback after CUDA_HOME in the library search cascade
- Introduces CTK root derivation functions for Linux and Windows that extract installation paths from resolved library paths
- Provides comprehensive test coverage (21 tests) for all edge cases and search order behavior
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cuda_pathfinder/tests/test_ctk_root_discovery.py |
Comprehensive test suite covering CTK root derivation, canary probe mechanism, and search order priority |
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py |
Implements the canary probe function and integrates it into the library loading cascade after CUDA_HOME |
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py |
Adds CTK root derivation functions and try_via_ctk_root method to leverage existing anchor-point search logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Tests that create fake CTK directory layouts were hardcoded to Linux paths (lib64/, libnvvm.so) and failed on Windows where the code expects Windows layouts (bin/, nvvm64.dll). Extract platform-aware helpers (_create_nvvm_in_ctk, _create_cudart_in_ctk, _fake_canary_path) that create the right layout and filenames based on IS_WINDOWS. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test |
|
/ok to test |
The rel_paths for nvvm use forward slashes (e.g. "nvvm/bin") which os.path.join on Windows doesn't normalize, producing mixed-separator paths like "...\nvvm/bin\nvvm64.dll". Apply os.path.normpath to the returned directory so all separators are consistent. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test |
Problem
libnvvm.solives in$CTK_ROOT/nvvm/lib64/, which is not on the system linker path. Every other CTK lib lives in$CTK_ROOT/lib64/which ldconfig knows about. This meansdlopen("libnvvm.so.4")fails on bare system CTK installs whenCUDA_HOMEis not set -- even though the library is right there on disk.Pathfinder already handles nvvm for pip (
nvidia-cuda-nvccwheel) and conda ($CONDA_PREFIX/nvvm/lib64/), so the gap is specifically: system CTK, noCUDA_HOME, no pip, no conda.Solution
Add a canary probe as a new search step. When direct system search fails:
cudart)lib64/on Linux,bin/orbin/x64/on Windows)_find_lib_dir_using_anchor_pointwhich already knows nvvm lives innvvm/lib64/The mechanism is generic -- any future lib with a non-standard sub-path just needs its entry in the anchor-point table.
Search order
The canary fires after
CUDA_HOMEto preserve backward compatibility:site-packages → conda → already-loaded → system dlopen → CUDA_HOME → canary probeUsers with
CUDA_HOMEset expect it to be authoritative; the canary is a last resort for when nothing else works.Edge cases
CUDA_HOMEsetCUDA_HOME; canary never runsCUDA_HOME, nvvm is first requestcudartvia system search, derives root, finds nvvmCUDA_HOME, other libs loaded firstcudarteither), raisesDynamicLibNotFoundErrorcudartbut CTK root has no nvvmNone, falls through to errorNone, falls through to errorChanges
find_nvidia_dynamic_lib.py--derive_ctk_root()+try_via_ctk_root()on_FindNvidiaDynamicLibload_nvidia_dynamic_lib.py--_try_ctk_root_canary()wired into the cascadetests/test_ctk_root_discovery.py-- 21 tests covering all of the aboveMade with Cursor