Skip to content

feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457

Open
titaiwangms wants to merge 9 commits into
feat/mobius-builder-components-filterfrom
feat/rtn-quantization-components-to-skip
Open

feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457
titaiwangms wants to merge 9 commits into
feat/mobius-builder-components-filterfrom
feat/rtn-quantization-components-to-skip

Conversation

@titaiwangms
Copy link
Copy Markdown
Contributor

@titaiwangms titaiwangms commented May 8, 2026

Summary

Add optional components_to_skip parameter (list of component names) to the OnnxBlockWiseRtnQuantization pass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. Default None quantizes all components (fully backward compatible).

Depends on #2456 (MobiusBuilder components_to_export). This PR's diff is scoped to RTN changes only — MobiusBuilder changes are in the base branch.

Motivation

When quantizing large multi-component VLMs (e.g. Mistral-3 with decoder + vision_encoder + embedding), users may want to skip RTN quantization for specific components such as the embedding model — which may need to stay in higher precision (e.g. due to a GatherBlockQuantized bug with certain inputs).

Changes

  • olive/passes/onnx/rtn_quantization.py:

    • from typing import Optional (removed unused List)
    • Add components_to_skip: list PassConfigParam (default None)
    • Override run() to intercept CompositeModelHandler processing: copy skipped components via shutil.copytree (handles file vs directory model_path), quantize the rest via recursive self.run()
    • onnx_file_name fallback: use getattr(component_model, "onnx_file_name", None) or "model.onnx" to handle handlers constructed without an explicit file name
    • Mirror base class _initialize() guard and post-processing (model_attributes propagation + _carry_forward_additional_files per component)
    • Non-composite (single) models are unaffected
  • test/passes/onnx/test_rtn_quantization.py:

    • New TestRTNQuantizationComponentsToSkip class with 5 tests
    • Skipped component passes through unchanged (MatMul present, no MatMulNBits)
    • Non-skipped components are quantized
    • components_to_skip=None quantizes all (backward compat)
    • Single-model (non-composite) unaffected
    • Unknown component name logs a warning (does not raise)
    • Removed redundant AcceleratorSpec reimport; wrapped long lines (120-char limit); split compound assertions

Testing

All 16 tests pass:

python3 -m pytest test/passes/onnx/test_rtn_quantization.py -v

Copilot AI review requested due to automatic review settings May 8, 2026 22:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Olive’s ONNX tooling to better support multi-component (composite) models by allowing selective processing of components during export and RTN quantization.

Changes:

  • Add components_to_skip to OnnxBlockWiseRtnQuantization to copy selected composite components through unmodified while quantizing the rest.
  • Add tests validating components_to_skip behavior on composite and non-composite models, including a warning on unknown component names.
  • Add components_to_export to MobiusBuilder to export only a subset of components from multi-component mobius packages, with validation + new unit tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
olive/passes/onnx/rtn_quantization.py Adds components_to_skip config and custom composite-handling run() logic to skip quantization for selected components.
test/passes/onnx/test_rtn_quantization.py Adds unit tests covering components_to_skip behavior and logging.
olive/passes/onnx/mobius_model_builder.py Adds components_to_export config and filters which mobius package components are saved/returned.
test/passes/onnx/test_mobius_model_builder.py Adds unit tests for components_to_export filtering/validation and updates the fake package save behavior.

Comment thread olive/passes/onnx/rtn_quantization.py Outdated
Comment thread olive/passes/onnx/mobius_model_builder.py
Comment thread olive/passes/onnx/mobius_model_builder.py
titaiwangms and others added 6 commits May 8, 2026 23:12
Add optional `components_to_skip` parameter (list of component names) to
OnnxBlockWiseRtnQuantization pass. When a composite model component's name
appears in this list, its model files are copied to the output path unchanged
instead of being quantized. Non-listed components are quantized as usual.
When not set (default None), all components are quantized (backward compatible).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused List from typing import (only Optional is needed)
- Remove redundant required=False (implied by default_value=None)
  to match conventions of neighboring PassConfigParam declarations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rride

The base Pass.run() gates on self._initialized before calling _initialize().
When components_to_skip is set and the model is a CompositeModelHandler,
the override bypasses super().run() entirely, so the init guard must be
replicated explicitly. This ensures consistent lifecycle semantics if a
subclass ever overrides _initialize().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Call _carry_forward_additional_files per-skipped component to mirror
  what base Pass.run() does for each individual component
- Remove redundant AcceleratorSpec reimport in test _make_pass
- Wrap long lines in test helpers to stay under 120 chars (Black)
- Split compound assertion into two separate assertions for clarity

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use ONNXModelHandler directly (remove OnnxHandler alias imported from
  internal module path)
- Use list[str] type annotation for components_to_skip (matches
  mobius_model_builder.py convention from feat/mobius-builder-components-filter)
- Add warning when components_to_skip names a component not found in the
  composite model (misspellings were silently ignored before)
- Add comment explaining the _initialized guard in run() override: it mirrors
  Pass.run() behavior and is required because we bypass super().run() for
  composite models
- Remove dead code: model_attributes was set in the constructor and then
  redundantly reassigned on the next line for skipped components

Also adds test_components_to_skip_unknown_name_warns to cover the warning case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…WiseRtnQuantization

When copying a skipped component, onnx_file_name may be None if the
ONNXModelHandler was constructed without an explicit file name. Fall back to
'model.onnx' which is the standard Olive convention to avoid passing None to
the output ONNXModelHandler constructor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@titaiwangms titaiwangms force-pushed the feat/rtn-quantization-components-to-skip branch from 49d8455 to 4da7446 Compare May 8, 2026 23:12
@titaiwangms titaiwangms changed the base branch from main to feat/mobius-builder-components-filter May 8, 2026 23:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread olive/passes/onnx/rtn_quantization.py Outdated
Comment thread olive/passes/onnx/rtn_quantization.py Outdated
@titaiwangms
Copy link
Copy Markdown
Contributor Author

Review (synthesized from 4 reviewers: readability / code / critical / deep)

Convergent findings flagged with [N reviewers]. Severity follows worst-case among reviewers.

Major

1. copytree(src.parent, ...) can copy unrelated siblings [code]

In the skip branch of run(): when component_model.model_path points to an .onnx file (not a directory), the code does src_dir = src.parent and copies the whole parent. If multiple components share a parent directory (flat composite layout), the skip silently bloats/corrupts the output by dragging in siblings. Fix: copy the ONNX file + its .data sidecar + listed additional_files explicitly, not the entire parent dir.

2. Path comparison without .resolve() risks deleting the source [critical]

if src_dir != component_output_path:
    if component_output_path.exists():
        shutil.rmtree(str(component_output_path))
    shutil.copytree(str(src_dir), str(component_output_path))

Lexical Path compare. If absolute and relative forms point to the same dir (very plausible in stacked passes, symlinked workspaces, or case-insensitive FS), the inequality check passes, then rmtree(component_output_path) deletes the actual source before copytree reads from it. Fix: compare .resolve()'d paths, and reject overlapping src/dst.

3. Non-atomic rmtree+copytree on large external-data dirs [critical]

A crash, signal, or disk-full between rmtree and the end of copytree leaves the output unusable. Fix: stage to a sibling dir, validate, then atomic-rename.

4. SKIP branch silently lies about handler type [deep]

output_component = ONNXModelHandler(
    model_path=str(component_output_path),
    onnx_file_name=onnx_file_name,
    model_attributes=component_model.model_attributes,
)

It unconditionally constructs ONNXModelHandler, but CompositeModelHandler permits any OliveModelHandler (nested composite, distributed ONNX, PyTorch). Latent today since Mobius produces only ONNX components, but worth at least an isinstance(component_model, ONNXModelHandler) assert with a clear error, plus a docstring constraint.

5. onnx_file_name fallback hardcodes "model.onnx" [code]

If src.is_file() and the input handler has no explicit onnx_file_name, the output handler points to model.onnx even if the source filename was different (e.g. decoder.onnx). Fix:

if src.is_file():
    onnx_file_name = getattr(component_model, "onnx_file_name", None) or src.name
else:
    onnx_file_name = getattr(component_model, "onnx_file_name", None) or "model.onnx"

6. _initialized guard duplicates a private base-class protocol [readability]

The override copies the _initialized / _initialize() dance from Pass.run() (olive_pass.py:209-211). The comment is honest about it, but the coupling is fragile — if Pass.run() ever adds a second guard or renames the attribute, this silently diverges with no test catching the drift. Long-term: add _ensure_initialized() to the base class so overrides can call it without copying. Short-term: at least add a class-level comment that this assumes _initialize() is a no-op or idempotent.

Minor

  • model.get_model_components() is called twice (once for the unknown-name warning, once for the loop). The property has side-effects merging parent attrs into children — idempotent today, but stylistically off and the base only iterates once. Materialize once: pairs = list(model.get_model_components()).
  • Inconsistent strictness with feat: add components_to_export filter to MobiusBuilder pass #2456: unknown component name → raise in feat: add components_to_export filter to MobiusBuilder pass #2456, warn in feat: add components_to_skip to OnnxBlockWiseRtnQuantization #2457. Pick one and document.
  • Path(output_model_path).with_suffix("") strips dotted-dir names like model.v1 (inherited footgun from base, not a regression).
  • Docstring should explain why _accepts_composite_model = True wasn't used instead — short answer: _run_for_config doesn't receive the component name, so a per-component skip check has no name to key on. Worth saying explicitly.
  • Add regression tests for: (a) non-model.onnx filenames, (b) file-based components in shared parent dirs.

Praise

  • Recursing via self.run(component_model, ...) for the non-skipped branch correctly inherits all base-class per-component behavior — clean design that avoids re-implementing _run_for_config invocation.
  • Deep-trace confirmed that model_attributes propagation and _carry_forward_additional_files are correctly preserved with no double-carry on additional files (the helper checks if not output_filepath.exists() before copying).
  • _initialize() idempotency: verified safe — base _initialize() is a no-op and not overridden on this pass; the recursive super().run() call will see _initialized = True and skip.

Note

This PR is stacked on #2456 — please rebase / re-target to main once #2456 lands.

Posted by GitHub Copilot CLI on behalf of @titaiwangms.

titaiwangms and others added 2 commits May 11, 2026 21:12
1. copytree now uses src directly (not src.parent) when model_path is a
   file, to avoid accidentally copying sibling ONNX files from the parent
   directory. When model_path is a file, only that file and its .data
   sidecar are copied into the output component directory.

2. onnx_file_name fallback now uses Path(component_model.model_path).name
   instead of hardcoded 'model.onnx', correctly handling components whose
   ONNX file is not named 'model.onnx'.

3. Path comparison now uses .resolve() on both sides to handle symlinks
   and relative paths correctly, preventing accidental self-deletion.

4. get_model_components() result is cached in all_components to avoid
   calling the generator twice (once for name validation, once for iteration).

5. Docstring updated to note that unknown names in components_to_skip
   produce a warning (not an error) — skipping is non-fatal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Expand OnnxBlockWiseRtnQuantization class docstring to document the
  components_to_skip parameter with example JSON config and behavior notes
  (unknown names warn, non-composite models unaffected)
- Update docs/source/features/quantization.md RTN section to explain
  components_to_skip usage for multi-component VLM models, with a
  composite-model example config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants