AI · Long compute / Inference

The user submits a request that is going to take 30 seconds or more. Image generation. A complex analytical query. A long-context reasoning task. The user has crossed the unit-task boundary; static skeletons no longer carry their weight; engagement is the last move before they leave.

This scenario sits in the 10 S+ band. Block & Zakay 1997's Block & Zakay 1997 meta-analysis frames the trade-off — engagement compresses prospective duration while expanding retrospective duration; the design decision is whether you have made that trade deliberately. Fitch's Fitch Slack and FIFA examples are the canonical references; Myers 1985 Myers 1985 is the determinate-progress fallback where the inference reports phases.

AI · Streaming response

A chat-style assistant returns a ~200-character answer. Naive: total wait, then the full response drops in. Tuned: ~600 ms thinking state, then tokens stream at a natural reading pace.

1 – 10 S

Off

Press Run to start

What is happening

The ai-streaming demo stands in. For a real long inference, the tuned flow stacks more layers:

Thinking state during the time-to-first-token gap — the same dots-and-cursor pattern from ai-chat-streaming-response.
Streaming render as soon as the first token arrives, paced to a natural reading rhythm.
Tool-call transparency if the inference involves visible tool calls — narrate them ("Searching…", "Reading…", "Reasoning…").
Determinate progress if the inference can report phases ("Step 3 of 7"); fall back to engagement otherwise.
Cancellation always available — the stop button must respond inside the perceptual frame even if the abort takes longer.
Background fallback — past 30–60 seconds, offer "do this in the background and notify me" so the user can leave the surface.

What to tune

Pre-action — submit button echo within ~50 ms; thinking dots cover the time-to-first-token gap.
First 1 s — thinking state in place. No spinner, no skeleton over content the model will produce.
1 – 10 s — token streaming where text is the output. Tool-call transparency where the work is visible.
Past 10 s — engaging copy where applicable; determinate progress where the inference reports phases. Cancellation always visible.
Past 30–60 s — hand-off to background sync with notification. The foreground is no longer the right surface.

When perceived performance hurts you here

The engagement-vs-retrospective-duration trade is the central trap. A 30-second inference with rich engagement feels short while it runs and long in retrospect — the user remembers it as taking forever even when their session went smoothly. Slack and FIFA accept this; for inference where the user repeats the action many times in a session, the retrospective cost compounds.

The cleaner answer for repeat-use AI inference: ship determinate progress where measurable, tool-call transparency where applicable (the user is learning during the wait), and background sync past 60 s. Generic engaging content (motivational quotes, mini-games) belongs only on rare or one-off inferences.

Accessibility

aria-live="polite" on the streaming output and on tool-call narration.
aria-busy="true" during the inference; flip on completion.
prefers-reduced-motion: reduce — replace cross-fades and pulse animations with static states.
Always provide a way out — visible cancellation, visible "do in background", visible re-attempt on failure.

References

References · 3

Block & Zakay 1997
Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184–197. The trade-off engagement makes during long inference waits.
Fitch
Fitch, E. Perceived Performance: The Only Kind That Really Matters (conference talk). Engaging-loading examples (Slack, FIFA) that map onto long AI inference waits.
Myers 1985
Myers, B. A. (1985). The importance of percent-done progress indicators for computer-human interfaces. Proceedings of CHI '85, 11–17. Determinate progress where measurable.