fix: enforce changeset constraints in run/step handler pipeline#4563
Draft
fix: enforce changeset constraints in run/step handler pipeline#4563
Conversation
update_run/1 was using Ecto.Multi.update_all which silently discards changeset constraint annotations (foreign_key_constraint, assoc_constraint). This made constraints declared in Run.complete/2 and Step.finished/2 dead code. StartRun.call also didn't check changeset.valid? before persisting. - Rewrite update_run/1 to use Repo.update (honors constraints) - Restrict update_runs/2 to keyword-list only (Queue bulk path) - Add valid? check in StartRun.call before calling update_run - Add assoc_constraint(:output_dataclip) to Step.finished/2 - Extract broadcast_run_updates/1 as shared side-effect function - Keep Repo.exists? pre-check in CompleteRun for wire format compat
Adds observability to the permissive same-state and catch-all clauses in validate_state_change/1 before tightening the state machine.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR changes
update_run/1fromupdate_all(ignores changeset validity) toRepo.update(enforces it). This means state transition rejections that were always declared invalidate_state_change/1but never enforced are now enforced.The critical case:
:lost→:success/:failedis now rejected. In production, this race happens when a worker completes a run after the janitor has already marked it lost. Previously the worker silently won the race. Now it gets an error.Before merging, we need to decide how to handle this — options include:
final → finaltransitions when the source is:lost(the janitor yields to the worker)The full list of transitions that are now actually enforced (previously dead code):
final → final(e.g.:lost→:success):available→ anything except:claimed:claimed→:startedwithoutstarted_atvalidate_requiredfinished_atvalidate_requiredDescription
This PR fixes constraint enforcement in the run/step handler pipeline — the hottest code path in the system (worker ↔ channel communication).
Origin: During review of #4531 (Save final state for runs), Taylor left a comment asking whether the
Repo.exists?pre-check onfinal_dataclip_idwas necessary, or overkill:Investigation revealed the check is necessary — but only as a workaround. The
foreign_key_constraint(:final_dataclip_id)declared inRun.complete/2is dead code becauseRuns.update_run/1usesEcto.Multi.update_all, which silently discards all changeset constraint annotations. Without theRepo.exists?check, a non-existent dataclip ID would crash the channel process with an unhandledPostgrex.Error.This turned out to be a broader issue affecting two of the four handler paths:
update_run→update_allvalid?not checked,update_allignores errorsupdate_run→update_allRepo.insertRepo.updateStep.finished/2missingassoc_constraintRoot cause:
update_runs/2was built forQueue.claim's bulkupdate_allpath with a CTE optimization fence.update_run/1piggybacked on it, destructuring changesets into raw k/v pairs and losing all constraint metadata.Fixes:
update_run/1to useRepo.update(honors all changeset constraints)update_runs/2to keyword-list only via guard (Queue bulk path unchanged)valid?check inStartRun.callbefore callingupdate_runassoc_constraint(:output_dataclip)toStep.finished/2broadcast_run_updates/1as shared side-effect functionRepo.exists?pre-check inCompleteRunfor wire format compatibility; FK constraint is now a safety netRelated to #4531
Validation steps
mix test test/lightning/runs/handlers_test.exs(8 pass, 3 skipped — skipped tests document a separate state machine catch-all issue)mix test test/lightning/runs_test.exs test/lightning_web/channels/run_channel_test.exs(89/89 pass)mix format --check-formatted && mix credo --strict --all— cleanAdditional notes for the reviewer
Repo.exists?check inCompleteRun.resolve_final_dataclipis intentionally kept. Removing it would change the wire format of the error response to the worker (%{errors: %{field: [msg]}}plain map vs%{field: [msg]}changeset format). The FK constraint now works as defense-in-depth.@tag :skip— they document a pre-existing gap inRun.validate_state_change/1where the catch-all{_from, _to} -> changesetsilently accepts transitions from final states. This is a separate concern.get_stepnot scoped torun_id, andfinal_dataclip_idexistence check not project-scoped. These should be tracked separately.Queue.claimis unaffected — it already passes keyword lists toupdate_runs/2.AI Usage
Pre-submission checklist
/reviewwith Claude Code)