From Vibe Coding Assistants to Coding Assistants

Modern coding assistants often firehose code: a single prompt yields pages of plausible implementation that the developer must reverse‑engineer, reconcile with project conventions, and defend in review. That behavior is optimized for vibe coding—outputs that look complete and fluent—rather than for actual coding, where teams must understand, verify, and safely integrate changes. The result is slower reviews, hidden bugs, and brittle releases disguised as productivity.

The conceptual reason is simple: the binding constraint in real software work is human comprehension. Reviewers and maintainers must be able to grasp intent, check correctness, and assess risk quickly. If we let $C$ denote the cost of a change, a pragmatic objective is

$$C \approx \underbrace{T_{\text{understand}}}_{\text{grasp intent}} + \underbrace{T_{\text{verify}}}_{\text{prove behavior}} + \underbrace{I}_{\text{integration friction}} + \underbrace{\mathbb{E}[R]}_{\text{post‑merge risk}}$$

One‑shot code dumps may minimize keystrokes, but they inflate every term in CC. An assistant that optimizes for comprehension must therefore change cadence: reason first, then code; propose tests before implementation; emit minimal, local patches; attach telemetry and logs that make behavior observable. These are not stylistic preferences—they are direct levers on $T_{\text{understand}}, T_{\text{verify}}, I, \mathbb{E}[R]$.

If we accept that people are doing real engineering—not vibe demos—the product should support a more principled workflow that simulates how expert engineers actually work. Experts decompose precisely, articulate invariants up front, iterate in tight test loops, commit small reviewable diffs, refactor for clarity, and validate in staging with signals. The assistant should adopt this ritual by default and adapt it to the user and task (relaxing only for boilerplate scaffolding).

Translating that into UI/UX affordances:

  • Scope control at the outset. Let the user set granularity—“modify this function only,” “touch this file section,” or “scaffold tests, no implementation yet.” Clear scope prevents accidental API drift and lowers integration cost.
  • Plan approval before code. The first response is a compact plan: assumptions, invariants, complexity/perf trade‑offs, risks, and success checks. The user approves or edits the plan; only then does the assistant generate code. This makes intent explicit and cheap to correct.
  • Test‑first mode by default. The assistant proposes failing tests that encode the spec (unit/property/integration). After the user runs them, it offers the minimal passing patch and then an edge‑case regression suite. Tests externalize intent and compress verification time.
  • Patch‑oriented diffs, not overwrites. Suggestions arrive as git‑friendly diffs, one file/hunk at a time, with accept/revise/reject per hunk. Small, local patches bound the reviewer's working memory and make rollback trivial.
  • Repo‑aware suggestions. The assistant ingests lint/format rules, type settings, module boundaries, and the dependency graph before proposing changes. Respecting architecture removes incidental review friction.
  • Telemetry and logging with every patch. Show coverage deltas on touched targets, lightweight perf impacts (latency/allocations), and side‑effect summaries. Propose structured logs (event names, fields/types, sampling, levels, correlation IDs, secret/PII redaction) and include expect‑logs assertions in tests where appropriate. Facts, not vibes, drive acceptance.
  • Determinism and safety controls. Dry‑run scripts; pin seeds for ML; default to network‑off for local execution; provide obvious rollback commands. For logging, apply privacy‑first defaults (schema enforcement, sampling/retention, egress policies).
  • A deliberate “scaffolding” switch. When the goal is repetitive structure (SDKs, CRUD, config), allow bigger one‑shot generations—clearly labeled as trading nuance for speed—then collapse back to incremental mode for integration.

A short vignette makes the cadence concrete. The prompt is: “Add retry logic with exponential backoff to upload_file(); include tests for failure scenarios.” The assistant first returns reasoning only—where to hook retries, which exceptions are retryable, jitter choice, maximum attempts, idempotency guarantees, interface impact, and how success will be measured. After approval, it generates failing tests covering timeouts, transient 5xx, non‑retryable 4xx, and idempotency. The user runs the suite (failures expected). On request, the assistant proposes the minimal passing patch that touches only upload.py, updates docstrings and types, and introduces structured log events (retry_started, retry_attempt, retry_succeeded/retry_exhausted) with sampling and redaction defaults. Tests assert on these logs. Alongside the diff, the assistant reports coverage up on touched targets and a negligible latency delta. CI passes; the developer squashes and merges. Intent and verification live with the change.

We should judge such a system by outcomes, not demos: review time per PR, rework/revert rate, flaky test incidence, time‑to‑green, and per‑patch coverage/performance deltas and bundle size. If the assistant truly optimizes comprehension, these metrics improve without raising merge risk. Safeguards remain non‑negotiable: contamination scans to avoid licensed/template echo; deterministic runs; dry‑runs for side‑effecting scripts; and privacy‑first logging policies.

Subscribe to Gojiberries

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe