fix: handle silent schema sync failures in branch-scoped DB middleware by amikofalvy · Pull Request #2999 · inkeep/agents

amikofalvy · 2026-04-03T04:57:30Z

Summary

Root cause: Schema sync failures during Dolt branch checkout were silently swallowed — branchScopedDbMiddleware never inspected the checkoutBranch result. When sync failed (lock contention, merge conflicts from recent DROP TABLE migration), queries ran against branches with outdated schemas (missing columns like is_work_app), producing 500 errors.
Invisible errors: The pg driver's Error properties (message, code, severity) are non-enumerable, so JSON.stringify() produced cause: {} in logs, hiding the actual database error.
Adds retry logic (up to 2 attempts with backoff) for schema sync failures caused by advisory lock contention
Logs explicit errors with full sync failure details when retries are exhausted
Adds serializeCause() to the error handler that traverses the error cause chain and extracts non-enumerable pg Error properties (code, message, severity, detail, hint)

Operational note

Existing branches that are already out of sync need a one-time manual fix:

pnpm db:manage:sync-all-branches

Test plan

TypeScript compiles cleanly (tsc --noEmit)
Existing middleware tests pass
Existing error utility tests pass
Verify on staging: schema sync retries appear in logs when lock is contended
Verify on staging: dbError field in error logs now contains actual pg error details instead of {}
Run pnpm db:manage:sync-all-branches against production DoltgreSQL

Made with Cursor

Schema sync failures during Dolt branch checkout were silently swallowed, causing queries to run against branches with outdated schemas (missing columns like is_work_app), producing 500 errors with unhelpful cause: {} in logs. - Capture and inspect checkoutBranch result for sync failures - Retry schema sync up to 2 times on lock contention - Log explicit errors when sync fails after retries - Serialize pg Error cause chain properly (non-enumerable properties) so actual DB errors (code, message, severity) appear in logs Made-with: Cursor

vercel · 2026-04-03T04:57:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Apr 3, 2026 5:01am
agents-manage-ui	Ready	Preview, Comment	Apr 3, 2026 5:01am

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
agents-docs	Skipped		Apr 3, 2026 5:01am

changeset-bot · 2026-04-03T04:57:35Z

⚠️ No Changeset found

Latest commit: 6b92c29

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

pullfrog · 2026-04-03T04:58:51Z

TL;DR — Fixes silent 500 errors caused by schema sync failures during Dolt branch checkout. The middleware now inspects the checkoutBranch result, retries on lock contention, and the error handler properly serializes non-enumerable pg Error properties that were previously logged as {}.

Key changes

Retry schema sync in branchScopedDbMiddleware — inspects the CheckoutBranchResult and retries up to 2 times with backoff when sync is skipped due to advisory lock contention, logging an explicit error if retries are exhausted.
Add serializeCause() to error handler — traverses the error cause chain extracting non-enumerable pg Error properties (code, message, severity, detail, hint) so database errors appear in logs instead of empty {}.

_{Summary ｜ 2 files ｜ 1 commit ｜ base: main ← fix/schema-sync-silent-failure}

Schema sync retry in branch-scoped DB middleware

Before: branchScopedDbMiddleware fire-and-forgot checkoutBranch — if schema sync failed (lock contention, merge conflict), queries silently ran against a stale branch missing columns like is_work_app, producing 500s.
After: The middleware inspects CheckoutBranchResult.schemaSync, retries up to MAX_SCHEMA_SYNC_RETRIES (2) with linear backoff (50 ms × attempt), and logs an explicit error with the sync failure details when retries are exhausted.

The retry loop targets a narrow failure mode: hadDifferences: true combined with performed: false, meaning the branch schema diverged from main but sync was skipped — typically because another request held the advisory lock.

branchScopedDb.ts

Proper serialization of database error causes

Before: logServerError passed the raw err object to the logger. Because pg driver Error properties (message, code, severity) are non-enumerable, JSON.stringify produced cause: {} — hiding the actual database error.
After: A new serializeCause() function walks the cause chain (up to depth 3), explicitly extracts non-enumerable properties, and attaches the result as dbError in the log payload.

Why not just fix the logger serializer?
The pg driver sets properties as non-enumerable by design. Rather than patching a global serializer (which could have unintended side effects), `serializeCause` is a targeted extraction at the point of logging, keeping the fix scoped to error handling.

errorHandler.ts

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ Using Claude Opus ｜ 𝕏}

pullfrog

Solid fix for a real production pain point — the root cause analysis on non-enumerable pg Error properties is spot-on. Two actionable issues to address before merging: the retry loop fires on unretryable failures (merge conflicts), and dbError is a misleading field name for a generic cause serializer.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-04-03T05:01:10Z

agents-api/src/middleware/branchScopedDb.ts

+      while (
+        checkoutResult.schemaSync.hadDifferences &&
+        !checkoutResult.schemaSync.performed &&
+        syncAttempts < MAX_SCHEMA_SYNC_RETRIES
+      ) {
+        syncAttempts++;
+        logger.warn(
+          {
+            branch: resolvedRef.name,
+            error: checkoutResult.schemaSync.error,
+            attempt: syncAttempts,
+          },
+          'Schema sync not performed, retrying'
+        );
+        await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts));
+        checkoutResult = await checkoutBranch(requestDb)({
+          branchName: resolvedRef.name,
+          syncSchema: true,
+          autoCommitPending: true,
+        });
+      }


The retry condition (hadDifferences && !performed) fires on all sync failure modes — lock contention, merge conflicts, uncommitted-changes errors, and generic failures. But the PR description says this targets lock contention specifically. Retrying merge conflicts or "uncommitted changes" errors will fail identically each time, burning 2 retries + backoff for no benefit.

Consider narrowing the retry condition to lock contention only. The SchemaSyncResult already has a skippedDueToLock field, but it's dropped during the mapping to CheckoutBranchResult. Either:

Forward skippedDueToLock to CheckoutBranchResult and gate the retry on it, or

At minimum, only retry when error is undefined (which currently signals lock contention since the lock-skip path doesn't set an error string).

pullfrog · 2026-04-03T05:01:10Z

agents-api/src/middleware/branchScopedDb.ts

+        logger.warn(
+          {
+            branch: resolvedRef.name,
+            error: checkoutResult.schemaSync.error,
+            attempt: syncAttempts,
+          },
+          'Schema sync not performed, retrying'
+        );


checkoutResult.schemaSync.error is undefined in the lock contention path — SchemaSyncResult.skippedDueToLock is set but not forwarded to CheckoutBranchResult.schemaSync.error. This warning log will show error: undefined, which makes it hard for operators to understand why the retry happened. Consider logging a meaningful string like 'skipped due to lock contention' when the error field is absent, or better yet, forward skippedDueToLock from the result.

pullfrog · 2026-04-03T05:01:10Z

agents-api/src/middleware/errorHandler.ts

-        error: err,
        message: errorMessage,
        stack: errorStack,
+        ...(cause && { dbError: cause }),


dbError is a misleading field name — serializeCause handles any error's cause chain (HTTP errors, validation wrappers, etc.), not just database errors. Since this is a new field with no downstream consumers, renaming to cause would be more accurate and future-proof. If you want to avoid collision with Pino's own cause handling, errorCause (already used in AgentSession.ts:1825) would be consistent with existing patterns.

Suggested change

...(cause && { dbError: cause }),

...(cause && { errorCause: cause }),

pullfrog · 2026-04-03T05:01:10Z

agents-api/src/middleware/errorHandler.ts

      {
-        error: err,
+        message: errorMessage,
+        ...(cause && { dbError: cause }),


Same field naming concern as above.

Suggested change

...(cause && { dbError: cause }),

...(cause && { errorCause: cause }),

claude

PR Review Summary

(3) Total Issues | Risk: Medium

🟠⚠️ Major (1) 🟠⚠️

🟠 1) branchScopedDb.ts:119-128 Schema sync failure allows degraded operation without user feedback

Issue: After exhausting schema sync retries, the middleware logs an error but continues request execution without any indication to the API caller. The user receives a normal response even though subsequent queries may fail unpredictably due to outdated schema (e.g., missing columns like is_work_app).

Why: This creates a debugging nightmare where the symptom (column errors, 500s) is disconnected from the cause (schema sync failure). The log message helps operators but provides no feedback to API consumers. For write operations especially, proceeding with a stale schema is risky since data may be written to incomplete tables.

Fix: Consider one or more of the following:

Set a response header for observability without breaking the request:

if (checkoutResult.schemaSync.hadDifferences && !checkoutResult.schemaSync.performed) {
  logger.error(...);
  c.header('X-Schema-Sync-Warning', 'Schema sync failed; queries may use outdated schema');
}

Add span attributes so downstream failures can be correlated via tracing:

import { trace } from '@opentelemetry/api';
// ...
const span = trace.getActiveSpan();
span?.setAttribute('schema_sync.failed', true);
span?.setAttribute('schema_sync.attempts', syncAttempts + 1);

Fail fast for write operations (stricter option):

const isWriteMethod = !['GET', 'HEAD', 'OPTIONS'].includes(method);
if (checkoutResult.schemaSync.hadDifferences && !checkoutResult.schemaSync.performed && isWriteMethod) {
  throw new HTTPException(503, { message: 'Schema sync unavailable; please retry' });
}

Refs:

branchScopedDb.ts:119-128

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: branchScopedDb.ts:111 Retry backoff lacks jitter — add randomization to prevent thundering herd

💭 Consider (1) 💭

💭 1) errorHandler.ts:94-104 Original error object no longer logged directly

Issue: The previous implementation logged error: err directly. The new implementation extracts specific fields (message, stack, dbError) but no longer includes the full error object, which could lose properties that aren't explicitly extracted (e.g., custom properties from third-party libraries like HTTP client errors with response, config fields).

Why: While pg errors are now better handled, other error types may lose context. Low impact since the critical pg properties are now captured.

Fix: Consider keeping the original error object for non-pg errors, or accept this tradeoff since the extracted fields cover the most common debugging needs.

Refs:

errorHandler.ts:94-104

🧹 While You're Here (1) 🧹

🧹 1) branchScopedDb.ts:63-70 Silent fallback to manageDbClient masks infrastructure problems

Issue: When getPoolFromClient() returns null (line 65), the code logs an error but falls back to using manageDbClient directly and continues processing. This masks a fundamental infrastructure problem: the database client doesn't have the expected structure. Branch scoping will not work correctly.

Why: If pool extraction fails in production, every subsequent request will silently bypass branch scoping, potentially causing data isolation violations. This is pre-existing behavior, not introduced by this PR, but worth noting since you're already improving error handling in this file.

Fix: Consider throwing a 503 instead of silently proceeding:

if (!pool) {
  logger.error({}, 'Could not get connection pool from dbClient - branch scoping unavailable');
  throw new HTTPException(503, { message: 'Database connection pool unavailable' });
}

Refs:

branchScopedDb.ts:63-70

💡 APPROVE WITH SUGGESTIONS

Summary: This PR effectively addresses a real operational pain point — silent schema sync failures and invisible pg errors were causing 500s that were difficult to diagnose. The retry logic and serializeCause() helper are solid improvements. The main suggestion is to consider adding some form of feedback (header, span attribute, or controlled failure) when schema sync fails so that downstream errors can be correlated with the root cause. The jitter suggestion is a quick win for production reliability.

Discarded (1)

Location	Issue	Reason Discarded
`errorHandler.ts:54-78`	serializeCause has bounded recursion depth (positive observation)	Not an issue — this was a validation that the implementation correctly bounds recursion at depth 3 to prevent stack overflow

Reviewers (3)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Discarded
`pr-review-errors`	3	1	1	1	0	0
`pr-review-sre`	4	0	0	0	1	1
`pr-review-standards`	0	0	0	0	0	0
Total	7	1	1	1	1	1

Note: The schema sync failure finding from pr-review-errors and pr-review-sre were merged as they identified the same issue from different angles.

claude · 2026-04-03T05:01:29Z

agents-api/src/middleware/branchScopedDb.ts

+          },
+          'Schema sync not performed, retrying'
+        );
+        await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts));


🟡 Minor: Retry backoff lacks jitter

Issue: The retry delay uses linear backoff (50 * syncAttempts) without jitter. When multiple concurrent requests hit lock contention simultaneously, they'll all retry at the same intervals (50ms, 100ms), creating synchronized retry waves that can perpetuate contention.

Why: The codebase already uses jitter in similar patterns (withRetry.ts uses Math.random() * expDelay). Lock contention scenarios are particularly susceptible to thundering herd problems when retries are synchronized.

Fix:

Suggested change

await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts));

await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts * (1 + Math.random() * 0.5)));

Refs:

AWS: Exponential Backoff and Jitter

withRetry.ts existing pattern

github-actions · 2026-04-03T05:07:29Z

Preview URLs

Use these stable preview aliases for testing this PR:

UI: https://pr-2999-ui.preview.inkeep.com
API: https://pr-2999-api.preview.inkeep.com
API health: https://pr-2999-api.preview.inkeep.com/health

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

UI deployment: https://agents-manage-jczki1f31.preview.inkeep.com
API deployment: https://agents-epc735e1n.preview.inkeep.com

itoqa · 2026-04-03T05:44:11Z

Ito Test Report ✅

16 test cases ran. 4 additional findings, 12 passed.

Overall, 16 end-to-end/local API tests ran with 12 passing and 4 failing: baseline routed reads (Agents/Triggers/Settings), branch list/create and duplicate/conflict handling, malformed-input rejection, deletion safety, stale deep-link handling, auth-boundary checks, forced-503 sanitization, and post-stress recovery all behaved as expected without observed data leakage or broad stability regressions. The key defects were a high-severity Project Settings save blocker caused by UI edit-form ID regex validation on a disabled field (also invalidating two-tab overlap expectations), a high-severity concurrent branch-scoped PATCH burst path returning 500s when schema-sync lock retries are exhausted and requests proceed unsynced, and a medium-severity mobile scheduled-trigger creation failure where invalid browser timezone defaults are not validated/caught and escalate to 500.

✅ Passed (12)

Category	Summary	Screenshot
Adversarial	Concurrent triple submit produced one successful create and conflict responses for duplicates, and branch relist confirmed exactly one qa-dupe-test branch.
Adversarial	Malformed ref inputs were rejected with controlled 4xx problem responses and no stack/SQL leakage; valid ref=main still returned 200 afterward.
Adversarial	Script-injection and SQL-like branch names were rejected with 400 validation responses, and branch relist confirmed these payloads were not created or reflected as branches.
Adversarial	Unauthorized requests to the real dev-session endpoint are denied (401), valid bypass secret creates a session (200), and authenticated projects access succeeds.
Adversarial	Direct stale deep-link after deletion consistently produced controlled not-found handling, and repeated back/forward navigation remained safe without data leakage.
Adversarial	Forced 503 handling stayed sanitized (no sensitive internals exposed) and the app recovered after service restoration.
Edge	Created two disposable projects, deleted project A from main context (204), and confirmed it disappeared from API/UI listings while project B remained.
Edge	Both non-main delete attempts were safely blocked with explicit 404 problem responses and did not remove the project.
Edge	During reload and back-navigation around save attempts, the Settings page recovered to a consistent state (non-saved baseline), and controls remained functional with no deadlock spinner/toast.
Logic	Post-stress recovery sequence succeeded: projects read, settings update, and branch read all completed with stable UI behavior.
Happy-path	Baseline project routes (Agents, Triggers, Settings) loaded without generic fallback and without 5xx responses.
Happy-path	Branch list returned 200, creating qa-retry-check-1775193521 returned 201, and relist returned 200 with the new branch present exactly once.

ℹ️ Additional Findings (4)

These findings are unrelated to the current changes but were observed during testing.

Category	Summary	Screenshot
Edge	⚠️ Concurrent PATCH burst produced repeated 500 responses instead of deterministic success/conflict handling.
Edge	🟠 In two-tab overlapping edits, neither save persisted and both tabs converged to baseline because the same edit-form validation defect prevents updates before conflict policy can apply.
Edge	🟠 Mobile scheduled-trigger creation can fail with HTTP 500 when an invalid browser timezone is used as default cron timezone.
Happy-path	⚠️ Settings updates are blocked for existing IDs like `proj_qa` because edit-mode form validation still enforces a stricter ID regex on the disabled ID field, so changes do not persist.

⚠️ Concurrent write burst does not produce user-facing schema-sync failure cascade

What failed: Most concurrent PATCH requests returned HTTP 500 instead of deterministic success or explicit conflict handling, even though a follow-up read remained coherent.
Impact: Concurrent edits can fail unpredictably with server errors in normal multi-user or automation bursts. This reduces reliability of project updates and can force manual retries.
Steps to reproduce:
1. Authenticate to the local Manage API with a valid dev session.
2. Send 5-10 concurrent PATCH requests to /manage/tenants/default/projects/edge-project on the same branch-scoped context.
3. Observe burst responses and then run a follow-up GET projects request to confirm the list still loads.
Stub / mock context: A development bypass token was used to authenticate API requests and the test targeted a disposable project (edge-project) for safety. No response stubs, route interception, or mock payload injection were applied to the PATCH burst itself.
Code analysis: I inspected the branch-scoped middleware and schema-sync lock behavior in production code. The middleware retries schema sync briefly, but after retries are exhausted it only logs and still executes the route handler, while the schema-sync routine explicitly reports lock-skipped sync attempts as unsynced; this combination makes write queries run against stale schema under contention.
Why this is likely a bug: The code explicitly allows request execution to continue after confirmed unsynced schema state, which is a direct production-code path to write-query 500s under concurrent branch traffic.

Relevant code:

agents-api/src/middleware/branchScopedDb.ts (lines 97-127)

while (
  checkoutResult.schemaSync.hadDifferences &&
  !checkoutResult.schemaSync.performed &&
  syncAttempts < MAX_SCHEMA_SYNC_RETRIES
) {
  syncAttempts++;
  await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts));
  checkoutResult = await checkoutBranch(requestDb)({
    branchName: resolvedRef.name,
    syncSchema: true,
    autoCommitPending: true,
  });
}

if (checkoutResult.schemaSync.hadDifferences && !checkoutResult.schemaSync.performed) {
  logger.error(
    { branch: resolvedRef.name, syncError: checkoutResult.schemaSync.error, attempts: syncAttempts + 1 },
    'Schema sync failed after retries — queries may fail due to outdated schema'
  );
}

packages/agents-core/src/dolt/schema-sync.ts (lines 217-224)

if (!lockAcquired) {
  // Another request is currently syncing this branch
  // Return early - that request will handle the sync
  return {
    synced: false,
    hadDifferences: true,
    skippedDueToLock: true,
  };
}

🟠 Two-tab overlapping edits remain deterministic

What failed: Both tabs converge to baseline with neither edit persisted; expected behavior is deterministic last-write-wins or explicit conflict handling after at least one successful write.
Impact: Concurrent-edit behavior cannot be validated because writes are blocked at the form layer, so users can lose intended updates and receive no meaningful conflict outcome. This undermines reliability of collaborative editing workflows.
Steps to reproduce:
1. Open /default/projects/proj_qa/settings in two browser tabs.
2. In tab A, edit Description and click Update project.
3. Without refreshing tab B, edit Description to a different value and click Update project.
4. Reload both tabs and verify both revert to baseline with no persisted overlap outcome.
Stub / mock context: Real sign-in was bypassed with a local dev-session shortcut so the run could establish an admin session deterministically, and local service host wiring was adjusted to match the containerized environment. No route-level API response mocks were used for the project settings save flow.
Code analysis: I reviewed the same settings update pipeline used by this test and found no separate two-tab conflict branch in the failing path; both tabs submit through the same projectSchema gate. Because edit mode still validates the disabled ID against a hyphen-only regex, saves can be blocked in both tabs regardless of overlap sequencing.
Why this is likely a bug: Both-tab non-persistence follows directly from the same client validation mismatch, so overlap logic never gets a valid persisted write to resolve.

Relevant code:

agents-manage-ui/src/components/projects/form/project-form.tsx (lines 196-203)

<GenericInput
  control={form.control}
  name="id"
  label="Id"
  placeholder="my-project"
  description="Choose a unique identifier for this project. This cannot be changed later."
  disabled={!!projectId || readOnly}
  isRequired
/>

agents-manage-ui/src/components/projects/form/validation.ts (lines 35-38)

.regex(/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/, {
  message:
    'ID must start and end with lowercase alphanumeric characters, and may contain hyphens',
}),

packages/agents-core/src/validation/schemas/shared.ts (lines 5-14)

const URL_SAFE_ID_PATTERN = /^[a-zA-Z0-9\-_.]+$/;

const ResourceIdSchema = z
  .string()
  .trim()
  .nonempty('Id is required')
  .max(MAX_ID_LENGTH)
  .regex(URL_SAFE_ID_PATTERN, {
    message: 'ID must contain only letters, numbers, hyphens, underscores, and dots',
  })

🟠 Mobile viewport branch-dependent workflow remains functional

What failed: Submit returns a server error (500) and no trigger is created; expected behavior is successful creation or a handled validation error for invalid timezone input.
Impact: Users on environments that resolve to a non-IANA browser timezone can be blocked from creating scheduled triggers. This breaks core automation setup in the UI and surfaces as a server failure instead of actionable validation.
Steps to reproduce:
1. Open the Manage UI on iPhone 13 viewport and go to a project's Triggers area.
2. Click New scheduled trigger, select an agent, and continue to the scheduled-trigger form.
3. Keep cron schedule mode with default timezone populated from the browser and submit Create Scheduled Trigger.
4. Observe a server error response and confirm the trigger is not created.
Stub / mock context: Authentication used a local dev-session bypass for test access, but no route-level response mocking or scheduled-trigger API stubbing was applied to this flow; the observed 500 came from the real create path under local services.
Code analysis: I inspected the mobile form defaults, scheduled trigger create route, and cron next-run helper. The UI defaults cronTimezone directly from browser timezone, and the create API path computes next run with cron parser without validating timezone format or catching parser errors on create.
Why this is likely a bug: Production create flow accepts timezone strings from client defaults but does not guard cron parsing errors at create time, so invalid timezones can escalate to 500 instead of a controlled 4xx validation path.

Relevant code:

agents-manage-ui/src/components/scheduled-triggers/scheduled-trigger-form.tsx (lines 130-143)

const getDefaultValues = (): ScheduledTriggerFormData => {
  // Get browser's timezone for new triggers
  const browserTimezone = Intl.DateTimeFormat().resolvedOptions().timeZone || 'UTC';

  if (!trigger) {
    const p = defaultsFromParams;
    return {
      enabled: true,
      name: '',
      description: '',
      scheduleType: (p?.scheduleType as 'cron' | 'one-time') || 'cron',
      cronExpression: p?.cronExpression || '',
      cronTimezone: p?.cronTimezone || browserTimezone,

packages/agents-core/src/utils/compute-next-run-at.ts (lines 15-22)

if (cronExpression) {
  const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();
  const interval = CronExpressionParser.parse(cronExpression, {
    currentDate: baseDate,
    tz: cronTimezone || 'UTC',
  });
  return interval.next().toISOString();
}

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 370-377)

const enabled = body.enabled ?? true;
const nextRunAt = enabled
  ? computeNextRunAt({
      cronExpression: body.cronExpression,
      cronTimezone: body.cronTimezone,
      runAt: body.runAt,
    })
  : null;

⚠️ Project update mutation remains stable through connection-scoped checkout

What failed: The save does not persist and the form surfaces an ID regex error even though the ID field is disabled during edit; expected behavior is that editable fields save successfully for existing valid project IDs.
Impact: Project settings cannot be reliably updated for existing projects whose IDs are valid in backend schemas but rejected by the edit form regex. This blocks routine configuration changes and creates false "successful save" expectations.
Steps to reproduce:
1. Open /default/projects/proj_qa/settings for an existing project whose ID includes an underscore.
2. Edit Name or Description and click Update project.
3. Reload the settings page.
4. Observe values revert and an ID validation error blocks a successful update path.
Stub / mock context: Real sign-in was bypassed with a local dev-session shortcut so the run could establish an admin session deterministically, and local service host wiring was adjusted to match the containerized environment. No route-level API response mocks were used for the project settings save flow.
Code analysis: I inspected the settings form submission path in agents-manage-ui and the API update schema in packages/agents-core. The form always resolves against projectSchema (which validates id with a strict hyphen-only regex) while edit mode disables the ID input; backend update schema explicitly omits id, so client-side validation is stricter than the production update contract and can block valid updates.
Why this is likely a bug: The UI edit flow validates and blocks on an immutable id field that the backend update contract does not even accept, so legitimate edits are rejected by client logic before persistence.

Relevant code:

agents-manage-ui/src/components/projects/form/validation.ts (lines 31-38)

export const projectSchema = z.object({
  id: z
    .string()
    .min(1, 'Project ID is required')
    .regex(/^[a-z0-9]([a-z0-9-]*[a-z0-9])?$/, {
      message:
        'ID must start and end with lowercase alphanumeric characters, and may contain hyphens',
    }),

agents-manage-ui/src/components/projects/form/project-form.tsx (lines 122-124)

const form = useForm<ProjectFormData>({
  resolver: zodResolver(projectSchema),
  defaultValues: initialData ? createDefaultValues(initialData) : { ...defaultValues },
});

packages/agents-core/src/validation/schemas.ts (lines 2511-2514)

export const ProjectUpdateSchema = ProjectInsertSchema.partial().omit({
  id: true,
  tenantId: true,
});

Commit: 6b92c29

View Full Run

Tell us how we did: Give Ito Feedback

vercel bot deployed to Preview – agents-manage-ui April 3, 2026 05:00 View deployment

pullfrog bot reviewed Apr 3, 2026

View reviewed changes

vercel bot deployed to Preview – agents-api April 3, 2026 05:01 View deployment

claude bot reviewed Apr 3, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle silent schema sync failures in branch-scoped DB middleware#2999

fix: handle silent schema sync failures in branch-scoped DB middleware#2999
amikofalvy wants to merge 1 commit intomainfrom
fix/schema-sync-silent-failure

amikofalvy commented Apr 3, 2026

Uh oh!

vercel bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Apr 3, 2026

Uh oh!

pullfrog bot commented Apr 3, 2026

Uh oh!

pullfrog bot left a comment

Uh oh!

pullfrog bot Apr 3, 2026

Uh oh!

pullfrog bot Apr 3, 2026

Uh oh!

pullfrog bot Apr 3, 2026

Uh oh!

pullfrog bot Apr 3, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Uh oh!

itoqa bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	...(cause && { dbError: cause }),
	...(cause && { errorCause: cause }),

	await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts));
	await new Promise((resolve) => setTimeout(resolve, 50 * syncAttempts * (1 + Math.random() * 0.5)));

Conversation

amikofalvy commented Apr 3, 2026

Summary

Operational note

Test plan

Uh oh!

vercel bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Apr 3, 2026

⚠️ No Changeset found

Uh oh!

pullfrog bot commented Apr 3, 2026

Key changes

Schema sync retry in branch-scoped DB middleware

Proper serialization of database error causes

Uh oh!

pullfrog bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (1) 🟠⚠️

🟡 Minor (1) 🟡

💭 Consider (1) 💭

🧹 While You're Here (1) 🧹

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 3, 2026

Preview URLs

Uh oh!

itoqa bot commented Apr 3, 2026

Ito Test Report ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 3, 2026 •

edited

Loading