Retry Logic and Error Handling for Automated Scripts
Transient Failures Are the Rule, Not the Exception
API endpoints return 429 rate-limit errors. Network requests time out. A remote server restarts during a file transfer. A database connection drops for three seconds and comes back.
These are transient failures — temporary problems that resolve themselves if you just try again. But most automation scripts don’t try again. They hit the error, print a stack trace, and exit. The entire pipeline stops because of a momentary hiccup.
The standard fix is wrapping every call in retry logic: try/except blocks, backoff timers, retry counters, maximum attempt limits. This code is repetitive, error-prone, and has nothing to do with the actual task the script performs. It’s defensive boilerplate that bloats every script in the pipeline.
NORA handles retry logic, error routing, and timeout management at the workflow level — outside the scripts — so scripts can focus on their actual job.
Per-Node Retry with Exponential Backoff
Every node in a NORA workflow can be configured with retry settings:
- Retry count — how many times to re-execute the node after a failure
- Exponential backoff — each retry waits longer than the last (e.g., 1s, 2s, 4s, 8s), reducing pressure on rate-limited APIs
- Configurable timeouts — set a maximum execution time per node; if the script exceeds it, NORA kills the process and treats it as a failure
These settings are configured per node, not per workflow. A node calling a flaky API can retry five times with backoff, while a node running a local file operation retries zero times. Each node gets the error handling strategy appropriate to its task.
No code changes to the scripts themselves. The retry logic lives in the workflow configuration.
Kill Button for Hung Processes
Sometimes a script doesn’t fail — it just hangs. A network call that never times out, a subprocess waiting on input that will never arrive, an infinite loop triggered by unexpected data.
NORA provides a kill button for every running node. Click it, and NORA terminates the process tree using tree-kill on Windows — ensuring child processes are cleaned up, not just the parent. PID tracking ensures the right process is killed.
stopOnError: Halt or Continue
Workflows handle errors in one of two modes:
stopOnError enabled — when any node fails (after exhausting its retries), the entire workflow halts. Subsequent nodes do not execute. This is the safe default for pipelines where each step depends on the previous one.
stopOnError disabled — when a node fails, the workflow logs the error and continues to the next node. This is useful for workflows with independent tasks where one failure shouldn’t block the others.
This flag can be set per workflow and per schedule. A workflow might run with stopOnError enabled during a critical nightly job but disabled during a daytime test run.
Error Routing with Condition Nodes
For more granular control than stop-or-continue, Condition nodes route execution based on what happened upstream:
- Exit code routing — a condition node checks the exit code of the previous node and sends execution down the success path (exit 0) or the error path (non-zero)
- Output matching — check stderr or stdout for specific error messages using substring match or regex
- Numeric threshold — if a script outputs a count or metric, route based on whether it exceeds a limit
This turns error handling into a visual, editable part of the workflow. The error recovery path is visible on the canvas — not buried inside a script’s exception handler.
Example: A data ingestion workflow runs a Python script that pulls records from an API. A condition node checks the exit code. On success, the next node processes the data. On failure, a different branch sends an alert notification and writes the error to a log file. Both paths are visible on the canvas, both are editable, and neither requires modifying the original Python script.
Continue From Next Node After Failure
When a workflow stops due to an error, NORA doesn’t force a full restart. Click to continue execution from the next node in the sequence. This is useful when a failure is manually resolved (e.g., restarting a service that was down) and the remaining nodes should still run.
Loop Support for Retry Patterns
NORA’s loop edges (loop:N) re-execute a path up to N times with iteration memory — the loop tracks which iteration it’s on and can use that information in condition evaluations.
A safety cap of 50 iterations prevents runaway loops. If a loop hits 50 without meeting its exit condition, NORA stops it and logs the event.
This is useful for polling patterns: run a check, evaluate the result with a condition node, and loop back if the desired state hasn’t been reached yet — up to a defined limit.
Real-Time Error Visibility
When a node fails, NORA displays stderr output directly on the node in the canvas. No switching to a terminal, no opening a log file — the error message is visible immediately where the failure occurred.
The execution history records error details for every run: which node failed, what the exit code was, what stderr contained, and how long the node ran before failing. This history persists across sessions for later review.
Getting Started
NORA runs on Windows 10 and later. Download it from software.reibuys.com/nora and install. A paid license key is required — one-time purchase, no subscription, no recurring charges. 30-day money-back guarantee.