[Misc] Add named top-level loops by hughperkins · Pull Request #440 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-03-31T14:58:39Z

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

When a kernel has multiple top-level for loops, each becomes a separate GPU kernel. This adds a `name` parameter to `qd.loop_config()` that propagates through the compilation pipeline and appears in the compiled kernel names, making them identifiable in profilers and traces. Example: `qd.loop_config(name="compute_forces")` produces kernel names like `my_kernel_c0_abc_0_compute_forces_range_for` instead of the generic `my_kernel_c0_abc_0_range_for`. The loop_name also appears in IR dumps (QD_DUMP_IR=1) as `loop_name=...` on offloaded range_for/struct_for statements. Includes 6 unit tests covering correctness, name propagation in IR dumps, no-leak between loops, mixed named/unnamed, and serialize combo.

The clone() methods for RangeForStmt, StructForStmt, and OffloadedStmt were not copying the loop_name field. Since the LLVM codegen clones each offloaded task before compilation, the user-provided loop name was lost.

hughperkins · 2026-04-01T21:55:24Z

python/quadrants/lang/misc.py

    parallelize=None,
    block_dim_adaptive=True,
    bit_vectorize=False,
+    name=None,


hughperkins · 2026-04-01T21:55:30Z

python/quadrants/lang/misc.py

        parallelize (int): The number of threads to use on CPU
        block_dim_adaptive (bool): Whether to allow backends set block_dim adaptively, enabled by default
        bit_vectorize (bool): Whether to enable bit vectorization of struct fors on quant_arrays.
+        name (str): Optional name for this loop, used in GPU kernel names for profiling and debugging.


hughperkins · 2026-04-01T21:55:35Z

python/quadrants/lang/misc.py

    if bit_vectorize:
        _bit_vectorize()

+    if name is not None:


Ok, so this is part of the qd.loop_config function that we are putting over for loops in the python scope. like

qd.loop_config(name='foo',...)

hughperkins · 2026-04-01T21:57:35Z

quadrants/codegen/llvm/codegen_llvm.cpp


-  auto task_kernel_name = fmt::format(
-      "{}_{}_{}{}", kernel_name, task_codegen_id, stmt->task_name(), suffix);
+  auto task_kernel_name =


note to self: loop name is coming from OffloadedStmt stmt

hughperkins · 2026-04-01T21:58:50Z

quadrants/codegen/spirv/spirv_codegen.cpp

-      task_name_(fmt::format("{}_t{:02d}",
-                             params.ti_kernel_name,
-                             params.task_id_in_kernel)) {
+      task_name_(params.task_ir->loop_name.empty()


note to self: loop name coming from params.task_ir , but this is also an OffloadedStmt *, same as in codegen_llvm.cpp, so both get from OffloadedStmt.

hughperkins · 2026-04-01T21:59:19Z

quadrants/ir/frontend_ir.cpp

      strictly_serialized(o.strictly_serialized),
      mem_access_opt(o.mem_access_opt),
-      block_dim(o.block_dim) {
+      block_dim(o.block_dim),


note to self: FrontEndForStmt

hughperkins · 2026-04-01T21:59:55Z

quadrants/ir/frontend_ir.cpp


 void FrontendForStmt::init_config(Arch arch, const ForLoopConfig &config) {
  is_bit_vectorized = config.is_bit_vectorized;
  strictly_serialized = config.strictly_serialized;


note to self: FrontendForStmt initailzied from ForLoopConfig

hughperkins · 2026-04-01T22:00:19Z

quadrants/ir/frontend_ir.h

  MemoryAccessOptions mem_access_opt;
  int block_dim{0};
  bool uniform{false};
+  std::string loop_name{""};


note to self: this is ForLoopConfig

ForLoopConfig struct definition

hughperkins · 2026-04-01T22:01:00Z

quadrants/ir/frontend_ir.h

  bool strictly_serialized;
  MemoryAccessOptions mem_access_opt;
  int block_dim;
+  std::string loop_name;


FrontendForStmt defintion.

note to self: FrontendForStmt constructors take in a ForLoopConfig.

hughperkins · 2026-04-01T22:02:31Z

quadrants/ir/frontend_ir.h

    for_loop_dec_.config.block_dim = v;
  }

+  void set_loop_name(const std::string &loop_name) {


note to self: this is in class ASTBuilder 🤔

note to self: for_loop_dec_ is a ForLoopDecoratorRecorder.

note to self: and it conatins a config, which is of type ForLoopConfig

so in astbuilder, calling set_loop_name sets the loop name on the ForLoopConfig contained int the ForLoopDecoratorRecorder

hughperkins · 2026-04-01T22:04:46Z

quadrants/ir/statements.cpp

                           int block_dim,
                           bool strictly_serialized,
-                           std::string range_hint)
+                           std::string range_hint,


note to self: RangeForStmt

Replace tempfile.TemporaryDirectory with the tmp_path fixture in the two offline IR dump tests. Made-with: Cursor

Per-test qd.reset() resets the default compile config, so saving and restoring debug_dump_path is unnecessary; keep restoring QD_DUMP_IR only. Made-with: Cursor

Use monkeypatch.setenv so the env var is restored automatically instead of a manual finally block. Made-with: Cursor

Add a third range-for with loop_config(block_dim=64) but no name; assert exactly two loop_name= annotations and each expected name appears once. Made-with: Cursor

Remove test_loop_config_name_correctness; assert fill_a/fill_b ndarray values in test_loop_config_name_ir_dump alongside c. Made-with: Cursor

Covered by test_loop_config_name_ir_dump (named + unnamed-for IR and values). Made-with: Cursor

Leak behavior is covered by IR dump tests (count loop_name=) and ndarray checks. Made-with: Cursor

Single- vs two-name leak cases are already covered by IR counts in test_loop_config_name_ir_dump. Made-with: Cursor

Serialize path range-fors are printed without OffloadedStmt loop_name=; include the stored loop name in the RangeForStmt line so after_offload dumps stay searchable. Made-with: Cursor

- Write PTX beside IR dumps under compile_config.debug_dump_path (with QD_DUMP_IR set) instead of hardcoded /tmp/ptx; match QUADRANTS_LOAD_PTX path. - Add QD_DUMP_SPIRV=1 to emit *_before_opt.spirv / *_after_opt.spirv text without requiring QD_DUMP_IR. Made-with: Cursor

- test_loop_config_name_cuda_ptx_dump: QD_DUMP_IR, asserts .version/.target in .ptx under debug_dump_path (CUDA arch only; skips when unsupported). - test_loop_config_name_spirv_dump: QD_DUMP_SPIRV on metal/vulkan (vulkan excluded on Darwin), checks OpCapability in *_after_opt.spirv. Made-with: Cursor

Made-with: Cursor

tests/python/test_loop_config_name.py

hughperkins · 2026-04-01T23:33:36Z

tests/python/test_loop_config_name.py

+    spirv_files = list(tmp_path.glob("*_after_opt.spirv"))
+    assert len(spirv_files) > 0, f"No *_after_opt.spirv under {tmp_path}"
+    combined = "\n".join(f.read_text() for f in spirv_files)
+    assert "OpCapability" in combined, combined


irrelvant assert, and missing loop name assert

Offloaded task TaskAttributes::name already encodes loop_name (e.g. kernel_0_t00_myloop). Use it for *_before_opt.spirv / *_after_opt.spirv filenames so dumps match PTX-style discoverability. SPIR-V disassembly does not contain that string; assert on the dumped filename. Loop_config test adds name spirv_fill_loop and drops OpCapability check. Made-with: Cursor

hughperkins · 2026-04-01T23:33:45Z

tests/python/test_loop_config_name.py

+    ptx_files = list(tmp_path.glob("*.ptx"))
+    assert len(ptx_files) > 0, f"No .ptx files under debug_dump_path {tmp_path}"
+    combined = "\n".join(f.read_text() for f in ptx_files)
+    assert ".version" in combined, combined


irrelvant assert, and missing loop name assert

tests/python/test_loop_config_name.py

Made-with: Cursor

hughperkins · 2026-04-01T23:37:25Z

quadrants/codegen/spirv/spirv_codegen.cpp

  auto *root = params_.ir_root->as<Block>();

  const char *dump_ir_env = std::getenv(DUMP_IR_ENV.data());
+  const char *dump_spirv_env = std::getenv(DUMP_SPIRV_ENV.data());


dont think we want this.

hughperkins · 2026-04-01T23:37:42Z

quadrants/codegen/spirv/spirv_codegen.cpp


    TaskCodegen cgen(tp);
    auto task_res = cgen.run();
+    const std::string &spirv_dump_basename = task_res.task_attribs.name;


hughperkins · 2026-04-01T23:39:19Z

I have read every line added in this PR, and reviewed the lines. I take responsibilty for the lines added and removed in this PR, and won't blame any issues on Opus.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c72d21a51b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T23:42:20Z

quadrants/codegen/llvm/codegen_llvm.cpp

+          : fmt::format("{}_{}_{}_{}{}", kernel_name, task_codegen_id,
+                        stmt->loop_name, stmt->task_name(), suffix);


Sanitize loop_name before embedding it in task symbols

Building LLVM task function names directly from stmt->loop_name allows characters that are later rewritten by CUDA JIT name sanitization (jit_cuda.cpp convert()), but the launcher still looks up kernels using the original unsanitized OffloadedTask::name (kernel_launcher.cpp). If a user sets qd.loop_config(name=...) with characters like -, space, or /, PTX symbols are renamed while launch-time lookup uses the old name, causing kernel lookup failures at runtime for CUDA.

Useful? React with 👍 / 👎.

claude · 2026-04-01T23:59:29Z

quadrants/transforms/offload.cpp

          offloaded->body->insert(std::move(s->body->statements[j]));
        }
        offloaded->range_hint = s->range_hint;
+        offloaded->loop_name = s->loop_name;
        root_block->insert(std::move(offloaded));
      } else if (auto st = stmt->cast<StructForStmt>()) {
        assemble_serial_statements();


🔴 When loop_config(serialize=True, name="foo") is used, the user-supplied name is silently dropped from compiled LLVM/PTX function names. The loop_name is correctly stored on the inner RangeForStmt (visible in IR dumps), but is never copied onto the outer serial OffloadedStmt that codegen uses to derive function names. The existing test test_loop_config_name_serialize_ir_dump only checks the *after_offload* IR text — where the inner RangeForStmt does print loop_name=serial_sum — so the gap in compiled kernel naming is undetected.

Extended reasoning...

What the bug is

When a user writes qd.loop_config(serialize=True, name="serial_sum") followed by a for loop, they expect the compiled GPU/CPU task function to include serial_sum in its name for profiling/debugging. The name is silently dropped: the resulting compiled function is named something like kernel_0_serial, not kernel_0_serial_sum_serial.

The code path that triggers it

In offload.cpp, the top-level offloading loop (starting line 93) branches on:

if (auto s = stmt->cast<RangeForStmt>(); s && !s->strictly_serialized) { // ... creates range_for OffloadedStmt and copies s->loop_name (line 137) } else { pending_serial_statements->body->insert(std::move(stmt)); // line 165 }

When strictly_serialized=true, the condition s && !s->strictly_serialized is false. The RangeForStmt (which carries loop_name) is inserted into pending_serial_statements->body. The serial OffloadedStmt (pending_serial_statements) never has its own loop_name field set — it remains an empty string.

Why existing code does not prevent it

The non-serialized range_for path at line 137 does offloaded->loop_name = s->loop_name; correctly. The emit_struct_for path at line 261 likewise does offloaded_struct_for->loop_name = for_stmt->loop_name;. The serialized path is the only case missing this assignment.

Impact

LLVM codegen (codegen_llvm.cpp lines 2020–2025) and SPIR-V codegen (spirv_codegen.cpp line 67) both read stmt->loop_name from the OffloadedStmt to construct the task function name. Because that field stays empty for the serial case, any call to loop_config(serialize=True, name=...) produces a compiled function whose name does not contain the user-supplied string. Profiling tools that match by kernel name, and any user relying on named serial kernels, get silently wrong results.

Why the test does not catch it

The test test_loop_config_name_serialize_ir_dump searches for "loop_name=serial_sum" in files matching *after_offload*. After offloading, the IR structure is:

serial OffloadedStmt { // loop_name = "" (never set) body { RangeForStmt loop_name=serial_sum { // printed here by ir_printer.cpp ... } } }

The IR printer for RangeForStmt does print loop_name=serial_sum, so the text appears in the dump and the assertion passes. But the actual compiled function name is derived from the outer OffloadedStmt::loop_name, which is empty.

The batching ambiguity concern (refutation addressed)

A refutation argues this is intentional because multiple serialized loops may be batched into one serial OffloadedStmt, making it ambiguous which inner loop name to propagate. This is a valid architectural concern. However, the argument does not change the observable failure: the feature documentation states name is "used in GPU kernel names for profiling and debugging", and there is no caveat that it is silently ignored for serialized loops. More importantly, the PR adds a test (test_loop_config_name_serialize_ir_dump) whose name and assertions give the impression the name-propagation feature works for serialized loops — it does not, and the test gap allows this to go undetected. At minimum, the test should either (a) explicitly assert the compiled function name does NOT contain the user-supplied name (documenting the intentional behavior), or (b) be accompanied by a note in the docstring that name has no effect on compiled function names when serialize=True.

Step-by-step proof

User calls qd.loop_config(serialize=True, name="serial_sum")

set_loop_name("serial_sum") stores the string in for_loop_dec_.config.loop_name

begin_frontend_range_for creates FrontendForStmt with loop_name="serial_sum" and strictly_serialized=true

lower_ast.cpp creates RangeForStmt with loop_name="serial_sum" and strictly_serialized=true

In offload.cpp line 96: s && !s->strictly_serialized → true && !true → false, so the else branch executes

Line 165: the RangeForStmt is moved into pending_serial_statements->body — pending_serial_statements->loop_name is never set

codegen_llvm.cpp line 2020: stmt->loop_name.empty() is true for the serial OffloadedStmt → name format is "{kernel}_{id}_serial", omitting "serial_sum" entirely

claude · 2026-04-01T23:59:30Z

quadrants/transforms/ir_printer.cpp

  }

  void visit(RangeForStmt *for_stmt) override {
-    print("{} : {}for in range({}, {}) {}{}{{", for_stmt->name(),
+    const std::string loop_name_suffix =
+        for_stmt->loop_name.empty()
+            ? std::string()
+            : fmt::format(" loop_name={}", for_stmt->loop_name);
+    print("{} : {}for in range({}, {}) {}{}{}{{", for_stmt->name(),
          for_stmt->reversed ? "reversed " : "", for_stmt->begin->name(),
          for_stmt->end->name(),
          for_stmt->is_bit_vectorized ? "(bit_vectorized) " : "",
-          block_dim_info(for_stmt->block_dim));
+          block_dim_info(for_stmt->block_dim), loop_name_suffix);
    for_stmt->body->accept(this);
    print("}}");
    dbg_info_printer_(for_stmt);


🟡 The visit(StructForStmt*) printer in ir_printer.cpp was not updated to print loop_name, even though this PR adds the field to StructForStmt and updates both visit(RangeForStmt*) and visit(OffloadedStmt*) to print it. Pre-offload IR dumps for struct-for loops with a user-defined name will silently omit the name, creating an inconsistency across loop types. To fix, apply the same loop_name_suffix pattern used in visit(RangeForStmt*) to visit(StructForStmt*).

Extended reasoning...

What the bug is

This PR adds a loop_name field to StructForStmt (statements.h:1063), propagates it through lower_ast.cpp (line 234) and offload.cpp (line 261), and updates visit(RangeForStmt*) in ir_printer.cpp (lines 427-439) and visit(OffloadedStmt*) (lines 690-706) to print loop_name. However, visit(StructForStmt*) at lines 442-451 was NOT updated and still prints only the snode name, bit-vectorized flag, scratch-pad info, and block_dim—never loop_name.

The specific code path

When a user writes qd.loop_config(name="my_struct_loop") before a struct-for loop, the name flows: ForLoopConfig.loop_name → FrontendForStmt.loop_name → StructForStmt.loop_name (via lower_ast) → OffloadedStmt.loop_name (via offload). The pre-offload IR dump calls visit(StructForStmt*), which omits the name entirely.

Why existing code does not prevent it

visit(StructForStmt*) was not touched in this PR's diff. The field exists on the C++ object but the printer simply never reads for_stmt->loop_name. There is no compile-time check enforcing consistency between what fields a statement carries and what the printer outputs.

Impact

This is a debug/tooling regression only. The actual functionality—loop name appearing in GPU kernel symbols—is unaffected because that reads from OffloadedStmt, which was correctly updated. However, anyone using QD_DUMP_IR to inspect the pre-offload IR for a struct-for loop with a name will see no loop_name= annotation, making it impossible to confirm the name was captured at that IR stage. This is inconsistent with RangeForStmt, which does show the name.

Step-by-step proof

User calls qd.loop_config(name="fill_snode") before a struct-for loop.

lower_ast creates a StructForStmt and sets new_for->loop_name = "fill_snode".

Pre-offload IR dump calls IRPrinter::visit(StructForStmt*). Current output: $0 : struct for in dense block_dim=adaptive none{. No loop_name=fill_snode.

After offload, OffloadedStmt correctly prints loop_name=fill_snode because that path was updated.

By contrast, a RangeForStmt with the same name would print loop_name=fill_snode even at the pre-offload stage.

How to fix

Apply the same pattern as visit(RangeForStmt*): compute a loop_name_suffix string and append it to the format string in visit(StructForStmt*).

claude · 2026-04-01T23:59:30Z

quadrants/codegen/spirv/spirv_codegen.cpp

      task_ir_(params.task_ir),
      compiled_structs_(params.compiled_structs),
      ctx_attribs_(params.ctx_attribs),
-      task_name_(fmt::format("{}_t{:02d}",
-                             params.ti_kernel_name,
-                             params.task_id_in_kernel)) {
+      task_name_(params.task_ir->loop_name.empty()
+                     ? fmt::format("{}_t{:02d}",
+                                   params.ti_kernel_name,
+                                   params.task_id_in_kernel)
+                     : fmt::format("{}_t{:02d}_{}",
+                                   params.ti_kernel_name,
+                                   params.task_id_in_kernel,
+                                   params.task_ir->loop_name)) {
  allow_undefined_visitor = true;
  invoke_default_visitor = true;



🟡 The loop_name string is embedded raw into dump file paths without sanitization in both the SPIR-V (spirv_codegen.cpp:64-77) and LLVM/CUDA (jit_cuda.cpp:94-128) backends. If a user passes name="foo/bar" to loop_config(), the SPIR-V dump silently fails to write because std::filesystem::path::operator/ interprets the embedded slash as a subdirectory separator, and in the CUDA path the _before_ptx.ll dump uses an unsanitized name while the .ptx dump uses the post-convert() sanitized name, producing inconsistent filenames for the same kernel. Both issues only manifest with QD_DUMP_IR=1 and a loop name containing path-separator characters; fix by sanitizing loop_name before embedding it in any dump path (e.g. replace / with _).

Extended reasoning...

Root cause

This PR introduces loop_name as a user-provided string that is embedded directly into two different dump-file naming paths. Neither path validates the string before using it as a filesystem path component.

SPIR-V path (spirv_codegen.cpp:64-77)

task_name_ is constructed as "{kernel}_t{id:02d}_{loop_name}" when loop_name is non-empty. This string becomes spirv_dump_basename (line 2507 of the modified file), which is then used in:

std::filesystem::path filename = ir_dump_dir / (spirv_dump_basename + "_before_opt.spirv");

std::filesystem::path::operator/ does not treat the right-hand operand as a plain filename — it parses it for path separators. So if loop_name = "foo/bar", spirv_dump_basename becomes "kernel_0_t01_foo/bar" and the resulting path is ir_dump_dir/kernel_0_t01_foo/bar_before_opt.spirv. Only ir_dump_dir itself is created (line 2482); the subdirectory foo is never created, so the std::ofstream open silently fails. The if (std::ofstream out_file(filename); out_file) guard suppresses any error message, meaning the developer gets no dump and no diagnostic.

LLVM/CUDA path (jit_cuda.cpp:94-128)

The PR also embeds loop_name into LLVM function names (e.g. kernel_0_my-loop_range_for) in codegen_llvm.cpp. In add_module() in jit_cuda.cpp, moduleToDumpName(M.get()) is called at line ~98 before compile_module_to_ptx(M) runs. Inside compile_module_to_ptx(), convert() sanitizes all function names in-place (lines 241-244): it replaces / with _xx_, @ with _at_, < with _lb_, etc. The .ptx dump at line ~122 calls moduleToDumpName(M.get()) after that mutation, so:

_before_ptx.ll filename uses the original (unsanitized) function name

.ptx filename uses the sanitized function name

For any loop_name containing special characters, the two files are named differently for the same kernel. If loop_name contains /, the .ll path will also silently traverse into a subdirectory that does not exist (same failure mode as SPIR-V).

Proof-of-concept

User writes qd.loop_config(name="net/layer") before a kernel loop.

loop_name = "net/layer" propagates through FrontendForStmt → RangeForStmt → OffloadedStmt.

In SPIR-V codegen, task_name_ = "my_kernel_t00_net/layer". Dump basename = "my_kernel_t00_net/layer". Path = <dump_dir>/my_kernel_t00_net/layer_before_opt.spirv. Directory <dump_dir>/my_kernel_t00_net/ does not exist → open fails silently.

In LLVM codegen, LLVM function name = "my_kernel_0_net/layer_range_for". moduleToDumpName returns "my_kernel_0_net/layer_range_for". .ll dump path = <dump_dir>/my_kernel_0_net/layer_range_for_before_ptx.ll → fails silently. After compile_module_to_ptx, convert() renames to "my_kernel_0_net_xx_layer_range_for". .ptx dump path = <dump_dir>/my_kernel_0_net_xx_layer_range_for.ptx → succeeds but mismatches.

Why existing code does not prevent this

The Python-side loop_config() (misc.py:712-713) passes the name through without any validation. ForLoopDecoratorRecorder::reset() merely clears the string; there is no sanitization anywhere in the pipeline. The LLVM backend has convert() for its symbols, but this runs on the LLVM IR symbol for PTX correctness, not as a filename sanitizer, and it runs too late for the .ll dump. The SPIR-V backend has no equivalent sanitization at all.

Impact

Both issues are nit-level: they only affect the debug dump feature (QD_DUMP_IR=1) and only when the user provides a loop name containing path-separator characters. Failures are silent (no crash, no error message). Normal kernel execution is completely unaffected.

Fix

Sanitize loop_name before embedding it in any dump-related file name. A simple approach: when constructing spirv_dump_basename and dumpName (LLVM), replace / (and any other filesystem-unsafe characters) with _. Alternatively, sanitize loop_name once when it is first set (e.g. in ASTBuilder::set_loop_name) so all downstream consumers benefit.

claude · 2026-04-01T23:59:30Z

quadrants/ir/statements.cpp

      begin, end, body->clone(), is_bit_vectorized, num_cpu_threads, block_dim,
      strictly_serialized);
  new_stmt->reversed = reversed;
+  new_stmt->loop_name = loop_name;
  return new_stmt;
 }



🟣 This PR adds new_stmt->loop_name = loop_name to RangeForStmt::clone() but omits the analogous new_stmt->range_hint = range_hint, so cloned statements silently lose their range_hint. Since offload.cpp copies range_hint from RangeForStmt to OffloadedStmt (line 136), any clone that happens before offloading causes the field to arrive empty. This is a pre-existing bug, but the PR directly edited this exact clone() method and is the ideal place to add the missing copy.

Extended reasoning...

What the bug is

RangeForStmt::clone() (statements.cpp lines 244-251) constructs a new statement with only 7 positional arguments, leaving range_hint at its default empty-string value:

auto new_stmt = std::make_unique<RangeForStmt>( begin, end, body->clone(), is_bit_vectorized, num_cpu_threads, block_dim, strictly_serialized); new_stmt->reversed = reversed; new_stmt->loop_name = loop_name; // added by this PR // range_hint is never assigned! return new_stmt;

The constructor signature is RangeForStmt(..., std::string range_hint = "", std::string loop_name = ""). The PR correctly passes and stores loop_name but leaves range_hint defaulting to "".

The specific code path that triggers it

range_hint is set by lower_ast.cpp when lowering an external-tensor for loop to a RangeForStmt, with a hint like "arg (0)" (lower_ast.cpp ~line 273). The offload pass at offload.cpp:136 then copies range_hint from the RangeForStmt to the OffloadedStmt. Any clone of the RangeForStmt that occurs between lowering and offloading — notably auto_diff.cpp:1447 which calls clone() to create reversed gradient loops — will produce a statement with an empty range_hint.

Why existing code does not prevent it

The RangeForStmt constructor takes range_hint by value with a default of "". The clone() override bypasses this parameter to avoid re-supplying all arguments, but then neglects to manually copy range_hint the way it copies reversed and loop_name. There is no assertion or validation that would catch a missing range_hint on a cloned node.

Impact

Impact is limited to cases where range_hint is non-empty on a RangeForStmt that gets cloned before offloading. The confirmed case is autodiff: auto_diff.cpp:1447 clones a RangeForStmt to build the reversed gradient loop, and the resulting clone carries an empty range_hint. When that clone is offloaded, offload.cpp:136 reads s->range_hint and propagates the empty string, so the resulting OffloadedStmt loses its range label. Profiling output and IR dumps for differentiated external-tensor loops will be missing the hint, and make_cpu_multithreaded_range_for.cpp will also lose the label for CPU thread tasks.

Step-by-step proof

User writes @qd.kernel def k(a: qd.types.ndarray(...)): for i in a: ... and applies autodiff.

lower_ast.cpp lowers the FrontendForStmt to RangeForStmt with range_hint = "arg (0)".

auto_diff.cpp:1447 calls range_for_stmt->clone() to build the reversed gradient loop.

Inside clone(), the new RangeForStmt is constructed without range_hint, so new_stmt->range_hint == "".

offload.cpp:136 does offloaded->range_hint = s->range_hint — reading the empty string, so the OffloadedStmt has no range hint.

Any profiling tool, IR printer, or downstream pass using range_hint for the gradient task gets an empty string instead of "arg (0)".

How to fix it

Add one line immediately after new_stmt->loop_name = loop_name;:

new_stmt->range_hint = range_hint;

This is exactly the pattern already used for reversed and loop_name, and matches how StructForStmt::clone() and OffloadedStmt::clone() copy their analogous fields.

hughperkins added 3 commits March 31, 2026 05:32

Fix loop_name not propagating to compiled kernel names

1a14239

The clone() methods for RangeForStmt, StructForStmt, and OffloadedStmt were not copying the loop_name field. Since the LLVM codegen clones each offloaded task before compilation, the user-provided loop name was lost.

Merge remote-tracking branch 'origin/main' into hp/named-loops

66f93e6

hughperkins temporarily deployed to publish_pypi March 31, 2026 15:56 — with GitHub Actions Inactive

hughperkins temporarily deployed to publish_pypi March 31, 2026 16:01 — with GitHub Actions Inactive

hughperkins temporarily deployed to publish_pypi March 31, 2026 16:08 — with GitHub Actions Inactive

hughperkins commented Apr 1, 2026

View reviewed changes

hughperkins added 12 commits April 1, 2026 19:21

test(loop_config_name): use pytest tmp_path for IR dump dirs

b2940b5

Replace tempfile.TemporaryDirectory with the tmp_path fixture in the two offline IR dump tests. Made-with: Cursor

test(loop_config_name): stop restoring debug_dump_path after IR dumps

7aaa0fd

Per-test qd.reset() resets the default compile config, so saving and restoring debug_dump_path is unnecessary; keep restoring QD_DUMP_IR only. Made-with: Cursor

test(loop_config_name): set QD_DUMP_IR via pytest monkeypatch

6808bd2

Use monkeypatch.setenv so the env var is restored automatically instead of a manual finally block. Made-with: Cursor

test(loop_config_name): named IR dump covers unnamed loop_config too

e910f25

Add a third range-for with loop_config(block_dim=64) but no name; assert exactly two loop_name= annotations and each expected name appears once. Made-with: Cursor

test(loop_config_name): drop redundant correctness test

0b8ea4a

Remove test_loop_config_name_correctness; assert fill_a/fill_b ndarray values in test_loop_config_name_ir_dump alongside c. Made-with: Cursor

test(loop_config_name): remove redundant mixed named/unnamed test

fff7262

Covered by test_loop_config_name_ir_dump (named + unnamed-for IR and values). Made-with: Cursor

test(loop_config_name): remove redundant does_not_leak runtime test

4483dcc

Leak behavior is covered by IR dump tests (count loop_name=) and ndarray checks. Made-with: Cursor

test(loop_config_name): remove redundant does_not_leak_in_ir test

e2cc31c

Single- vs two-name leak cases are already covered by IR counts in test_loop_config_name_ir_dump. Made-with: Cursor

Print loop_name on RangeForStmt in IR dumps

bbb0f02

Serialize path range-fors are printed without OffloadedStmt loop_name=; include the stored loop name in the RangeForStmt line so after_offload dumps stay searchable. Made-with: Cursor

PTX dump test: use named loop_config; assert name in PTX

c0bde93

Made-with: Cursor

hughperkins commented Apr 1, 2026

View reviewed changes

tests/python/test_loop_config_name.py Show resolved Hide resolved

hughperkins commented Apr 1, 2026

View reviewed changes

tests/python/test_loop_config_name.py Show resolved Hide resolved

Remove QD_DUMP_SPIRV; SPIR-V text dumps use QD_DUMP_IR only

c72d21a

Made-with: Cursor

hughperkins commented Apr 1, 2026

View reviewed changes

hughperkins marked this pull request as ready for review April 1, 2026 23:38

precommit

083aab7

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

claude bot reviewed Apr 1, 2026

View reviewed changes

erizmr approved these changes Apr 2, 2026

View reviewed changes

Merge branch 'main' into hp/named-loops

6e27682

hughperkins enabled auto-merge (squash) April 2, 2026 01:05

hughperkins merged commit 8b8f12c into main Apr 2, 2026
47 checks passed

hughperkins deleted the hp/named-loops branch April 2, 2026 02:00

		: fmt::format("{}_{}_{}_{}{}", kernel_name, task_codegen_id,
		stmt->loop_name, stmt->task_name(), suffix);

Conversation

hughperkins commented Mar 31, 2026

Brief Summary

Walkthrough

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughperkins commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 1, 2026

Choose a reason for hiding this comment

What the bug is

The code path that triggers it

Why existing code does not prevent it

Impact

Why the test does not catch it

The batching ambiguity concern (refutation addressed)

Step-by-step proof

Uh oh!

claude bot Apr 1, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Why existing code does not prevent it

Impact

Step-by-step proof

How to fix

Uh oh!

claude bot Apr 1, 2026

Choose a reason for hiding this comment

Root cause

SPIR-V path (`spirv_codegen.cpp:64-77`)

LLVM/CUDA path (`jit_cuda.cpp:94-128`)