feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457
feat: add components_to_skip to OnnxBlockWiseRtnQuantization#2457titaiwangms wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends Olive’s ONNX tooling to better support multi-component (composite) models by allowing selective processing of components during export and RTN quantization.
Changes:
- Add
components_to_skiptoOnnxBlockWiseRtnQuantizationto copy selected composite components through unmodified while quantizing the rest. - Add tests validating
components_to_skipbehavior on composite and non-composite models, including a warning on unknown component names. - Add
components_to_exporttoMobiusBuilderto export only a subset of components from multi-component mobius packages, with validation + new unit tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
olive/passes/onnx/rtn_quantization.py |
Adds components_to_skip config and custom composite-handling run() logic to skip quantization for selected components. |
test/passes/onnx/test_rtn_quantization.py |
Adds unit tests covering components_to_skip behavior and logging. |
olive/passes/onnx/mobius_model_builder.py |
Adds components_to_export config and filters which mobius package components are saved/returned. |
test/passes/onnx/test_mobius_model_builder.py |
Adds unit tests for components_to_export filtering/validation and updates the fake package save behavior. |
Add optional `components_to_skip` parameter (list of component names) to OnnxBlockWiseRtnQuantization pass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. When not set (default None), all components are quantized (backward compatible). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused List from typing import (only Optional is needed) - Remove redundant required=False (implied by default_value=None) to match conventions of neighboring PassConfigParam declarations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rride The base Pass.run() gates on self._initialized before calling _initialize(). When components_to_skip is set and the model is a CompositeModelHandler, the override bypasses super().run() entirely, so the init guard must be replicated explicitly. This ensures consistent lifecycle semantics if a subclass ever overrides _initialize(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Call _carry_forward_additional_files per-skipped component to mirror what base Pass.run() does for each individual component - Remove redundant AcceleratorSpec reimport in test _make_pass - Wrap long lines in test helpers to stay under 120 chars (Black) - Split compound assertion into two separate assertions for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use ONNXModelHandler directly (remove OnnxHandler alias imported from internal module path) - Use list[str] type annotation for components_to_skip (matches mobius_model_builder.py convention from feat/mobius-builder-components-filter) - Add warning when components_to_skip names a component not found in the composite model (misspellings were silently ignored before) - Add comment explaining the _initialized guard in run() override: it mirrors Pass.run() behavior and is required because we bypass super().run() for composite models - Remove dead code: model_attributes was set in the constructor and then redundantly reassigned on the next line for skipped components Also adds test_components_to_skip_unknown_name_warns to cover the warning case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…WiseRtnQuantization When copying a skipped component, onnx_file_name may be None if the ONNXModelHandler was constructed without an explicit file name. Fall back to 'model.onnx' which is the standard Olive convention to avoid passing None to the output ONNXModelHandler constructor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
49d8455 to
4da7446
Compare
…antization-components-to-skip
Review (synthesized from 4 reviewers: readability / code / critical / deep)Convergent findings flagged with [N reviewers]. Severity follows worst-case among reviewers. Major1. In the skip branch of 2. Path comparison without if src_dir != component_output_path:
if component_output_path.exists():
shutil.rmtree(str(component_output_path))
shutil.copytree(str(src_dir), str(component_output_path))Lexical 3. Non-atomic A crash, signal, or disk-full between 4. SKIP branch silently lies about handler type [deep] output_component = ONNXModelHandler(
model_path=str(component_output_path),
onnx_file_name=onnx_file_name,
model_attributes=component_model.model_attributes,
)It unconditionally constructs 5. If if src.is_file():
onnx_file_name = getattr(component_model, "onnx_file_name", None) or src.name
else:
onnx_file_name = getattr(component_model, "onnx_file_name", None) or "model.onnx"6. The override copies the Minor
Praise
NoteThis PR is stacked on #2456 — please rebase / re-target to
|
1. copytree now uses src directly (not src.parent) when model_path is a file, to avoid accidentally copying sibling ONNX files from the parent directory. When model_path is a file, only that file and its .data sidecar are copied into the output component directory. 2. onnx_file_name fallback now uses Path(component_model.model_path).name instead of hardcoded 'model.onnx', correctly handling components whose ONNX file is not named 'model.onnx'. 3. Path comparison now uses .resolve() on both sides to handle symlinks and relative paths correctly, preventing accidental self-deletion. 4. get_model_components() result is cached in all_components to avoid calling the generator twice (once for name validation, once for iteration). 5. Docstring updated to note that unknown names in components_to_skip produce a warning (not an error) — skipping is non-fatal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Expand OnnxBlockWiseRtnQuantization class docstring to document the components_to_skip parameter with example JSON config and behavior notes (unknown names warn, non-composite models unaffected) - Update docs/source/features/quantization.md RTN section to explain components_to_skip usage for multi-component VLM models, with a composite-model example config Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Add optional
components_to_skipparameter (list of component names) to theOnnxBlockWiseRtnQuantizationpass. When a composite model component's name appears in this list, its model files are copied to the output path unchanged instead of being quantized. Non-listed components are quantized as usual. DefaultNonequantizes all components (fully backward compatible).Depends on #2456 (MobiusBuilder
components_to_export). This PR's diff is scoped to RTN changes only — MobiusBuilder changes are in the base branch.Motivation
When quantizing large multi-component VLMs (e.g. Mistral-3 with decoder + vision_encoder + embedding), users may want to skip RTN quantization for specific components such as the embedding model — which may need to stay in higher precision (e.g. due to a
GatherBlockQuantizedbug with certain inputs).Changes
olive/passes/onnx/rtn_quantization.py:from typing import Optional(removed unusedList)components_to_skip: listPassConfigParam(defaultNone)run()to interceptCompositeModelHandlerprocessing: copy skipped components viashutil.copytree(handles file vs directorymodel_path), quantize the rest via recursiveself.run()onnx_file_namefallback: usegetattr(component_model, "onnx_file_name", None) or "model.onnx"to handle handlers constructed without an explicit file name_initialize()guard and post-processing (model_attributespropagation +_carry_forward_additional_filesper component)test/passes/onnx/test_rtn_quantization.py:TestRTNQuantizationComponentsToSkipclass with 5 testscomponents_to_skip=Nonequantizes all (backward compat)AcceleratorSpecreimport; wrapped long lines (120-char limit); split compound assertionsTesting
All 16 tests pass: