Skip to content

Diffusion ci tests#1495

Open
CharlelieLrt wants to merge 10 commits intoNVIDIA:mainfrom
CharlelieLrt:diffusion-ci-tests
Open

Diffusion ci tests#1495
CharlelieLrt wants to merge 10 commits intoNVIDIA:mainfrom
CharlelieLrt:diffusion-ci-tests

Conversation

@CharlelieLrt
Copy link
Collaborator

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@CharlelieLrt CharlelieLrt self-assigned this Mar 13, 2026
@CharlelieLrt
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator

Hey @greptile this pr introduces nearly 400 binary files for numerical checkpoints and consistency, which might complicate git cloning. Got any suggestions for handling that?

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Here are a few approaches to handle those binary checkpoint files:

1. Git LFS (most common)

  • Move binary files to Git LFS - keeps them out of clone history
  • Add to .gitattributes: *.npy filter=lfs diff=lfs merge=lfs -text (or whatever format you're using)
  • Clones stay fast, files downloaded only when needed
  • CI can use GIT_LFS_SKIP_SMUDGE=1 to skip download, then selectively fetch

2. External storage with download script

  • Store checkpoints in S3/GCS/Azure blob storage
  • Add a scripts/download_test_fixtures.py that fetches them before tests
  • CI downloads on demand, developers only download if running those specific tests
  • Keep a manifest file (JSON/YAML) tracking what's available

3. Hybrid: essential vs full test suites

  • Keep a minimal set (~10-20 files) for basic CI checks in repo
  • Store the full 400-file suite externally for comprehensive testing
  • Run full suite only on release branches or manually triggered workflows

4. Programmatic generation

  • If checkpoints are deterministic, generate them in CI from smaller seed files
  • Only store the configuration/seeds (much smaller)
  • Trade storage for compute time

For PhysicsNeMo specifically, I'd suggest Git LFS + lazy loading in CI (option 1) since it:

  • Maintains reproducibility (files versioned with code)
  • Keeps clones fast
  • Integrates cleanly with existing workflows

What kind of files are these - numpy arrays, model weights, HDF5?

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Collaborator Author

/blossom-ci

… for compilation tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
@CharlelieLrt
Copy link
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants