Skip to content

Single file failure aborts entire PR review, discarding all results #74

@haasonsaas

Description

@haasonsaas

Found during deep code review of pipeline orchestration

`src/review/pipeline/orchestrate/dag.rs`, lines 274-299:

```rust
execute_review_jobs(prepared.jobs, ReviewExecutionContext { ... }).await?
```

The `?` propagates the first error from any file review job, aborting the entire DAG execution. If one file review fails (e.g., one file is too large and causes an LLM timeout), all successful file reviews from the same batch are discarded.

For a 20-file PR where 19 files review successfully and 1 times out, the user gets zero review comments instead of 19 files worth.

Also found

  • DAG stages consume inputs via `Option::take()` — retry would fail because the Option is already `None`, despite stages being marked `retryable: true`
  • No timeout on individual DAG stages — a hanging LLM call consumes the entire 300s webhook budget

Fix: Collect results with `Result` per file, report partial results, and continue reviewing remaining files.

Acceptance

  • Per-file error isolation (failed files don't abort the batch)
  • Partial results returned for successful files
  • Per-stage timeout (not just overall webhook timeout)
  • DAG retry compatibility (don't consume inputs destructively)

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: review-pipelineReview pipeline, context, promptsbugSomething isn't workingpriority: highHigh priority enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions