Replies: 1 comment
-
|
I think you’re really dealing with 2 different timeout domains here, and that’s why it feels confusing:
Your heartbeat workaround makes sense for the stream side, but it does not change how long the underlying workflow step is allowed to run. So I would treat those as separate problems. For issue 1, my guess is the timeout logs for So I dont think the main cause is the For issue 2, if one single step can genuinely run longer than the platform max duration, then I wouldnt expect heartbeats or stream tricks to solve that. At that point the real options are more architectural:
So yeah, I would separate the conclusions like this:
For your first issue, I’d also inspect whether what you’re calling a “6 minute step” is actually one logical step composed of multiple internal step invocations / resumptions, because the repeated timeout entries on the workflow step endpoint make me think the platform/runtime boundaries are not lining up 1:1 with how you’re mentally grouping the work. So my short answer would be:
If this helps, feel free to mark it as the answer so others can find it faster. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
as per #980, the recommended approach for durable streams is to use the provided agent, but what if we're not making chat requests?
my use case is using LLMs to generate/summarize content, rather than chat. the workflow looks more or less like this:
nsectionsdepending on the model, step 4 can take minutes. currently in testing we're observing anywhere from 4-8 minutes. for this, we've needed to increase the maxDuration of
/.well-known/workflow/v1/stepin ourvercel.ts, which applies to all steps and not just this one workflow. it'd be cool if we were able to definemyStep.maxDuration = 600similar to how retries are defined.this solved the vercel functions timing out in the step functions, but then the streams were timing out. to accommodate HTTP/2 stream timeouts, i had to inject a heatbeat into the stream:
this all works fine. what's catching me out is two things:
/.well-known/workflow/v1/step, 7 of these are vercel function timeouts of 600s, despite the longest step only taking 6 minutes and the stream continuing to work just fine. the entire workflow fit within the 10 minute window, so there shouldn't be any function timeoutsfor issue 1, my current assumption is that wrapping the entire workflow in try-catch-finally is what's causing the issue:
for issue 2, this is a bit more complex since we're essentially giving the LLM full control over what it generates, so we don't have a reliable method of determining if a section is going to be large or not
has anyone else run into this kind of thing?
Beta Was this translation helpful? Give feedback.
All reactions