Just‑in‑time AI code generation: build the view you need now, not the whole app

What follows is very much a thought experiment in what does it mean if Artificial Intelligence (AI) generates code just in time (JIT) as a user interacts with a User interface (UI)? When the current “vibe coding” approach being to code the whole app up front speculatively?

What are we trying to achieve?

Most teams still plan, design, and build applications page‑by‑page ahead of time. With modern AI systems capable of synthesising production grade code and UI from structured intents, there’s another path: generate only the next view when the user actually needs it.

This post explores a practical architecture for just‑in‑time (JIT) AI code generation that:

Defers work until an interaction demands it.
Shrinks upfront scope and time‑to‑value.
Lowers GPU pressure by avoiding wasteful speculative generation.
Keeps quality high through guardrails, caching, and canary checks.

Why now? Tool‑augmented LLMs, strong UI/component systems, and infrastructure that can compile, sandbox, and ship micro‑artefacts in seconds make this viable today.

TL;DR

Generate the smallest viable artefact just in time: a page, fragment, or API handler responding to concrete user intent.
Use event topics (e.g., view.requested, form.submitted) to trigger generation.
Cache, sign, and reuse generated artefacts; regenerate only on drift or new constraints.
Measure latency as TTFT (time‑to‑first‑token/paint) and total build time; aim for sub‑second TTFT via skeleton UIs and streaming.
You avoid GPU scarcity not by buying fewer GPUs but by eliminating unnecessary generations: create views reactively instead of speculatively.

What is JIT AI code generation?

JIT generation is the practice of synthesising code only when it’s proven necessary by a concrete interaction. Instead of pre‑building every route and state, the system:

Captures intent (user action + context).
Plans and validates a minimal artefact (UI + data contract).
Generates code with an LLM/toolchain.
Compiles/tests/sandboxes the artefact.
Deploys it behind a stable route/contract.

If the same artefact is requested again with the same constraints, reuse from cache; otherwise, refine or fork.

Why this reduces GPU pressure

No speculative UX: Don’t render 20 pages “just in case.” A large share would never be used. You spend GPU minutes only when an interaction enforces a need.
Smaller prompts: JIT keeps the active context small (current task and constraints), which cuts tokens. Fewer tokens → fewer inference cycles.
Shorter outputs: Generating a focused view/component is cheaper than a whole app skeleton. Compose as you go.
Better hit rates: Caching at the artifact level means repeated flows are nearly free.

This shifts GPU usage from up‑front batch generation to thin, demand‑driven bursts.

Architecture at a glance

Event bus: Domain events trigger generation, e.g., ui.view.requested, data.schema.changed, or auth.policy.updated.
Planner: Validates the request, retrieves domain context, and selects a template/component kit.
Generator: LLM + tools that produce UI, wiring, and tests from templates plus constraints.
Verifier: Static checks, type checks, policy lint (PII, auth), and minimal canary tests.
Runtime packager: Emits a versioned, signed artifact (chunk/route) that can be hot‑loaded.
Cache + registry: Deduplicates and serves previously generated artifacts.

Event → artefact flow (example)

User clicks a “Revenue by Cohort” card on the dashboard.
The UI publishes ui.view.requested with: dataset=cfs.revenue, slice=cohort, viz=line, filters=[…].
Planner selects a chart view template and fetches the typed data contract.
Generator produces a page with a chart component, data loader, and access control checks.
Verifier type‑checks and runs a 500 ms canary (e.g. dataset schema satisfies the query, component renders in jsdom).
Packager compiles to a versioned route /views/revenue-by-cohort?v=sha256:abc.
Router mounts the chunk. Later requests reuse it from the registry unless inputs change.

Latency budgets that feel instant

Skeleton-first paint: 50–150 ms via pre‑shipped shell and suspense placeholders.
Generation + verify + bundle: 200–1200 ms depending on complexity and cache.
First interactive paint: < 1 s in the p50; tolerate p95 up to ~2–3 s with optimistic UI and progress affordances.

Treat TTFT (first skeleton paint) as the primary UX metric; generation can finish in the background if the shell communicates progress clearly.

Guardrails that keep this safe

Templates + typed contracts: Start from audited components and server handlers, not arbitrary code.
Policy checks: Enforce auth, PII, data locality, and logging requirements in the verifier stage.
Sandboxing: Execute generated code in an isolated runtime; no filesystem or network outside allow‑lists.
Observability: Emit traces and metrics for generation time, canary failures, and cache hit rate.
Human review gates: For sensitive domains, require approval before hot‑loading the artifact.

Caching and invalidation

Key artefacts by (intent, constraints, data contract version, policy profile).
Attach an SBOM and signature. Store in a content‑addressed registry.
Revalidate on schema or policy change events. Prefer patching the artifact rather than full regeneration when possible.

Example pseudo‑code

// Event consumer (planner)
on('ui.view.requested', async (evt) => {
    const ctx = await loadContext(evt.userId);
    const intent = normalise(evt.payload); // dataset, chart type, filters
    const key = cacheKey(intent, ctx.policyProfile, ctx.schemaVersion);

    const cached = await registry.get(key);
    if (cached) return mount(cached.route);

    const plan = await planView(intent, ctx);
    const generated = await llmGenerate(plan, templates());

    await verify(generated, ctx.policies);
    const artifact = await packageArtifact(generated);

    await registry.put(key, artifact);
    return mount(artifact.route);
});

Economics: why this beats prebuilding

Token cost: Prompt and completion tokens scale with scope. JIT limits scope to the next step only.
Compute time: Many prebuilt views are never visited. JIT converts that waste to zero.
Engineering time: You spec less up front; you validate with real interactions.

Public marketing pages should remain statically generated (like this site) for crawlability and speed.
Authenticated/product views can be JIT since bots don’t index them and contracts can be stable even if code is generated on demand.
For shareable deep links, mint the route on first access and keep it cached.

Risks and mitigations

Cold misses feel slow: Use optimistic UI, prefetch likely intents, and warm the cache from usage analytics.
Drift and sprawl: Enforce de‑duplication and archive unreferenced artifacts.
Quality gaps: Invest in verifier rules and tight templates; don’t let raw model output ship unreviewed.
Debuggability: Emit source maps and link artifacts to plans and prompts for post‑mortems.

From theory to practice

Start with a single flow: generate a report view on demand.
Instrument thoroughly: TTFT, generation time, cache hits, error rate.
Iterate on templates and policies based on real failures.
Expand to CRUD screens, wizards, and long‑tail admin views.

Just‑in‑time AI code generation doesn’t mean “generate everything, always.” It means “generate only what a user just proved they need.” Done well, it trims cost, sidesteps GPU pressure, and most importantly, keeps your shipping value at the pace of interaction.