Skip to content

PR7: feat(conformers): isolated nvMolKit worker backend#23

Closed
SergeiNikolenko wants to merge 4 commits into
codex/pr5a-nvmolkit-worker-backendfrom
codex/pr6a-inference-pipeline-fix
Closed

PR7: feat(conformers): isolated nvMolKit worker backend#23
SergeiNikolenko wants to merge 4 commits into
codex/pr5a-nvmolkit-worker-backendfrom
codex/pr6a-inference-pipeline-fix

Conversation

@SergeiNikolenko
Copy link
Copy Markdown
Member

@SergeiNikolenko SergeiNikolenko commented Feb 26, 2026

Summary

This PR introduces an isolated nvMolKit-based conformer worker backend.

Goal: provide a fast conformer path without coupling the main Matcha runtime environment to nvMolKit ABI/toolchain dependencies.

Motivation

In-process nvMolKit integration can make the core runtime environment fragile. This PR adopts a worker-process model:

  • main process remains clean,
  • worker runs in isolated environment,
  • backend selection is explicit and configurable.

What Changed

1) Isolated worker package

  • Added matcha-nvmolkit-worker package and worker CLI entrypoint.

2) Setup and compatibility scripts

  • Added scripts/setup_nvmolkit_worker.py for environment bootstrap and smoke checks.
  • Added scripts/nvmolkit_worker.py compatibility wrapper.

3) Runtime integration

  • Added backend switching/autodiscovery in preprocessing:
    • MATCHA_CONFORMER_BACKEND=auto|rdkit|worker
    • MATCHA_CONFORMER_WORKER_CMD
    • MATCHA_CONFORMER_WORKER_TIMEOUT_SEC
    • MATCHA_CONFORMER_WORKER_CHUNK_SIZE

4) Worker contract

  • Input: input.sdf + params.json
  • Output: output.sdf + meta.json
  • auto backend: fallback to RDKit
  • worker backend: fail-fast

5) Tests and docs

  • Added worker contract and conformer tests.
  • Updated README and .gitignore for worker setup/use.

File-Level Scope

  • .gitignore
  • README.md
  • matcha/utils/preprocessing.py
  • packages/matcha_nvmolkit_worker/README.md
  • packages/matcha_nvmolkit_worker/pyproject.toml
  • packages/matcha_nvmolkit_worker/src/matcha_nvmolkit_worker/__init__.py
  • packages/matcha_nvmolkit_worker/src/matcha_nvmolkit_worker/cli.py
  • scripts/nvmolkit_worker.py
  • scripts/setup_nvmolkit_worker.py
  • tests/test_conformer_generation_batch.py
  • tests/test_conformer_generation_in_memory.py
  • tests/test_conformer_worker_contract.py

Quick Validation Command

uv run python scripts/setup_nvmolkit_worker.py

MATCHA_CONFORMER_BACKEND=worker \
uv run matcha \
  -r data/protein.pdb \
  --ligand-dir data/ligands \
  --box-json data/box.json \
  --device cuda:0 \
  --n-samples 10 \
  --scorer gnina \
  --gnina-batch-mode combined \
  -o runs \
  --run-name pr7_worker_check \
  --overwrite

6) Auto-setup on first use

  • If MATCHA_CONFORMER_BACKEND=auto|worker and no worker command is configured/found,
    Matcha now performs a one-time automatic bootstrap via:
    uv run python scripts/setup_nvmolkit_worker.py --skip-smoke
  • New runtime controls:
    • MATCHA_CONFORMER_AUTO_SETUP_WORKER (default 1)
    • MATCHA_CONFORMER_WORKER_SETUP_TIMEOUT_SEC (default 3600)
  • On setup failure in auto, Matcha falls back to RDKit.
  • In worker mode, Matcha fails fast with a clear install hint.

Auto-setup Validation Command

rm -rf .venv-nvmolkit-worker
MATCHA_CONFORMER_BACKEND=auto uv run matcha \
  -r data/protein.pdb \
  --ligand-dir data/ligands \
  --box-json data/box.json \
  --device cuda:0 \
  --n-samples 10 \
  --scorer gnina \
  --gnina-batch-mode combined \
  -o runs \
  --run-name pr7_auto_setup_check \
  --overwrite

@SergeiNikolenko SergeiNikolenko changed the title PR6a: fix(inference): review-driven pipeline fixes PR6/4: fix(inference): review follow-up in pipeline scripts Feb 26, 2026
@SergeiNikolenko SergeiNikolenko changed the base branch from codex/pr5b-nvmolkit-worker-refactor to codex/pr5a-nvmolkit-worker-backend February 26, 2026 13:25
@SergeiNikolenko SergeiNikolenko changed the title PR6/4: fix(inference): review follow-up in pipeline scripts PR7: fix(inference): review follow-up in pipeline scripts Feb 26, 2026
@SergeiNikolenko SergeiNikolenko force-pushed the codex/pr5a-nvmolkit-worker-backend branch from dd4356e to 9e046e3 Compare February 26, 2026 13:46
@SergeiNikolenko SergeiNikolenko force-pushed the codex/pr6a-inference-pipeline-fix branch from b4141c5 to 2104c18 Compare February 26, 2026 13:47
@SergeiNikolenko SergeiNikolenko changed the title PR7: fix(inference): review follow-up in pipeline scripts PR7: feat(conformers): isolated nvMolKit worker backend Feb 26, 2026
- Extract _rdkit_generate_chunk() to eliminate 3x duplicated blocks
- Consolidate max_failures/max_iterations into single max_attempts
- Extract _conf_has_finite_coords predicate in worker CLI
- Compact setup script: str.removeprefix(), f-strings, shorter docstrings
- Remove 8 obvious/redundant comments
@SergeiNikolenko SergeiNikolenko force-pushed the codex/pr5a-nvmolkit-worker-backend branch from 9e046e3 to fdaef13 Compare February 26, 2026 15:55
@SergeiNikolenko SergeiNikolenko force-pushed the codex/pr6a-inference-pipeline-fix branch from 2104c18 to 9d0adcc Compare February 26, 2026 15:55
@SergeiNikolenko
Copy link
Copy Markdown
Member Author

Superseded by #32 (restacked on #31, no codex/ branch names).

@SergeiNikolenko SergeiNikolenko deleted the codex/pr6a-inference-pipeline-fix branch March 5, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant