A workflow engine has a different responsibility than an ordinary web request. A request should validate intent, record state, and return quickly. A workflow may need to execute steps, call tools, wait for approvals, retry failures, and write durable execution history over time.

That distinction leads to an important architecture decision: workflows should not run as one long-lived request inside the web application or API runtime. The API should create an execution job, persist the workflow run, and hand the work to managed worker infrastructure.

This keeps the interactive product responsive. A user can click Run now, see that the workflow has been queued, and continue using the product while the backend work proceeds independently.

The worker path also gives the system a better failure boundary. If a workflow step fails, times out, or hits a transient dependency problem, the failure belongs to an execution job and run record. It should not threaten the health of the main application process.

A job-driven model also makes scheduling cleaner. Manual runs, scheduled runs, approval continuations, and retries can all enter the same execution path: create a job, let a worker claim or receive it, update run and step state, and exit when the assigned work is done.

This avoids a common trap where scheduled workflows behave differently from manual workflows because the scheduler runs business logic directly. The scheduler should select due workflows and create jobs. It should not become a second workflow runner.

Retries should follow the same principle. A retry should create a fresh execution job rather than trying to mutate a running worker process. That makes retry behavior easier to reason about, easier to audit, and easier to show in the product.

The original failed run should remain visible. The retry should be explicit. If the retry creates a new run, it should link back to the original so operators can understand the sequence of attempts.

The practical design rule is simple: important workflow state should not live only in memory. Every run, step, pause, continuation, failure, retry, and outcome should be represented as durable records that the product can inspect.