Tool execution / agentic step

When an agent does something — reads a file, runs a search, edits code, executes a query — there is a 1 – 10 S window of real wall-clock time the perception layer has to manage. The work is not waiting on a model token stream; it is waiting on the tool itself. A spinner that says "Working…" for six seconds gives the user nothing. A streaming list of the actual tool calls — "Read package.json. Searched for useState. Edited SearchBox.tsx. Ran typecheck." — is the same wait, but the user reads the trajectory in real time and learns to trust the agent.

This is the perception scenario where AI agents either build trust or don't. Tool-call transparency is what makes Cursor's agent mode, Claude Code, and v0's iteration loops feel collaborative rather than oracular. The user does not just see the result; they see the work.

This scenario is in the 1 – 10 S band Miller 1968 Miller 1968 describes for unit-task latency. Each individual tool-call status update has to clear the Card-Moran-Newell Card, Moran & Newell 1983 ~100 ms perceptual frame so the stream registers as live, not as a series of state mutations. The Doherty 1982 Doherty 1982 ~400 ms productivity break covers the time-to-first-tool-call — past it, the agent feels stalled before it has even started. The Myers 1985 Myers 1985 finding that progress indicators outperform blank waits extends to non-numeric progress: a list of completed tool calls is a percent-done indicator with the percent implied by the count.

Tool execution

An agent reads a file, searches the codebase, edits a file, and runs a typecheck. Naive: opaque "Working…" spinner for the full duration. Tuned: each tool call streams in as the agent runs it, with its own running / done state.

1 – 10 S

Off

Press Run to start

What is happening in the demo

A simulated agent executes four tool calls in sequence: read package.json, search for useState across the source tree, edit SearchBox.tsx, run typecheck. Each step has its own gamma-jittered duration around the medians declared in the demo config (350 ms, 1.1 s, 700 ms, 1.5 s — total ~3.65 s p50). Click Run agent on either side.

The naive side disables the button, shows a generic Loader2 spinner with "Working…" text for the full ~3.65-second duration, then renders the final summary. The user has no signal of what the agent is doing at any moment. If a step hangs (real models do this), they cannot tell whether to keep waiting or stop. Cancellation is missing because there is nothing to cancel against — the only handle is "the whole thing."

The tuned side renders each tool call as a discrete row as the agent runs it. The currently-executing step has a primary-coloured spinner, the steps already done have a green check plus their result detail, the steps not yet started are dimmed circles. A cancel button appears next to Run agent while the agent is running — the user can stop at any point, and they can stop because they can see what is currently happening. Cancellation that does not show what is being cancelled is barely cancellation at all.

The visual register here is the same as a long-running command's terminal output — npm install printing each package as it resolves, git fetch showing the count of objects received. The user trusts the work because they see the work.

What to tune

Pre-action — submit button echo within ~50 ms; thinking dots cover the model's reasoning step before the first tool call.
First 1 s — first tool-call row appears. The list is the affordance; no spinner over the whole agent.
1 – 10 s — each tool call streams in as it runs. Status icon, label, result snippet (not just a count).
Cancellation — visible while the agent runs. Boundary is the next tool call, not an immediate hard stop. "Stopping after current step…" is the honest message.
Completion — explicit summary row. For multi-call work past 10 s, hand off to background sync; see the agentic workflow scenario.

When perceived performance hurts you here

Tool-call transparency only helps if the names and details are honest. An agent that surfaces "Reading user data" but is actually scanning the entire database is showing the user a misleading abstraction — the trust budget the transparency paid into gets withdrawn when the user notices the gap. If you cannot describe the tool call accurately in a short label, the answer is to redesign the tool call, not to hide it behind a generic name.

The other failure mode is the firehose. A tool that fires fifty individual sub-calls — say, "read each of these 50 files individually" — should not surface every one. Collapse to "Reading 50 files (12 / 50)" with a determinate counter. The principle is legible progress, not granular progress.

For tools that do return data the user can see — search results, code diffs, query rows — surface a snippet of the result in the tool-call row, not just the count. "Found 14 matches" reads as progress; "Found 14 matches in app/auth/, lib/session/, …" reads as understanding.

Cancellation needs a graceful path. Hard-killing an agent mid-tool-call can leave files half-edited, transactions un-committed, branches in inconsistent states. The right cancellation boundary is between tool calls — let the current one finish, abort before the next one starts. Surface the difference: "Stopping after current step…" vs. "Stop now."

Accessibility

aria-live="polite" on the tool-call list so screen readers announce each new step. Reserve aria-live="assertive" for failures or cancellation.
aria-busy="true" on the running row, removed when it completes — the same pattern as a determinate progress bar but at the tool-call granularity.
role="status" on the cancel button's "Stopping…" message so the cancellation is announced.
Focus management — when the user clicks Cancel, focus stays on the cancel button (now showing the cancellation in progress) until the agent confirms the stop.
prefers-reduced-motion — the spinner reduces to a static dot indicator; the appearance / disappearance of rows is instantaneous instead of animated.

References

References · 4

Miller 1968
Miller, R. B. (1968). Response time in man-computer conversational transactions. Proceedings of the AFIPS Fall Joint Computer Conference, 33(I), 267–277. The 1 – 10 S band where multi-step agent execution typically lands.
Card, Moran & Newell 1983
Card, S. K., Moran, T. P., & Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum. The ~100 ms perceptual frame each tool-call status update must clear so the streaming feels live.
Doherty 1982
Doherty, W. J., & Thadani, A. J. (1982). The Economic Value of Rapid Response Time. IBM Technical Report GE20-0752-0. Productivity drops sharply past ~400 ms — relevant for the time-to-first-tool-call budget on agent execution.
Myers 1985
Myers, B. A. (1985). The importance of percent-done progress indicators for computer-human interfaces. Proceedings of CHI '85, 11–17. ~86 % of participants preferred a percent-done indicator over a blank wait — extends to the streaming tool-call list as a non-numeric progress signal.