How I Started Coding by Voice Prompts (and Why I Still Press Enter Manually)

How I Started Coding by Voice Prompts (and Why I Still Press Enter Manually)

~ 4 min read


I didn’t set out to become a voice-first developer. It started as a small experiment to reduce typing, and now it is a normal part of how I work with AI coding tools.

I still type a lot, especially for final edits and precise refactors, but prompt creation has shifted heavily toward speech.

Why voice started working for me

The biggest unlock was simple: I work remotely.

At home, I can think out loud, iterate quickly, and feed those ideas straight into tools without worrying about bothering a room full of people. Voice prompts feel natural in that setup, especially when I am exploring trade-offs.

In practice, tools like Superwhisper and WisprFlow strip out filler words and distill that rambling first pass into something coherent.

Put 100 people in an open office all doing this, and it would be chaos. Even with excellent manners, acoustic spillover becomes everyone else’s context-switch tax.

That concern is not just intuition. Open-plan office studies repeatedly find that irrelevant speech is one of the most disruptive noise types for cognitive tasks, including memory-heavy and verbal work.

What my workflow looks like today

I use voice-to-text tooling to capture rough prompts, then I clean and validate before sending.

The goal is straightforward: speak naturally, then review a cleaned transcript instead of raw dictation.

Current pattern:

  1. Speak the initial prompt as a rough thought.
  2. Let voice-to-text transcribe and clean formatting.
  3. Edit the text manually to tighten intent, constraints, and acceptance criteria.
  4. Press Enter myself.

That last step matters. I’m still not brave enough to let any voice layer auto-submit directly into coding agents. I want one explicit review gate before execution.

Evidence: is anyone else doing this, and is it faster?

Short answer: yes, people are doing it, and speech input can be much faster for text entry. But end-to-end software development throughput is more nuanced.

1) Speech can be around 3x faster than typing for text entry

A controlled study on mobile text entry found speech input was about 2.9x faster than keyboard input in English (153 vs 52 words per minute) under lab conditions, with lower corrected error rates during entry.

That is the strongest basis for the “3x” claim, but it is about text-entry throughput, not end-to-end software delivery.

2) Programming by voice is real, but hybrid input often wins

Research on conversational programming interfaces found participants valued different strengths: voice for efficiency, text for precision. It also found that experience level mattered, with novice users more optimistic about voice programming than advanced users.

That maps to my experience. Voice is excellent for drafting intent and broader instructions, while keyboard editing is still better for fine control.

3) Open-office speech is a genuine productivity constraint

Studies in office acoustics and cognition consistently report that intelligible background speech in open-plan offices impairs performance and increases perceived distraction.

This reinforces why remote work made voice-first prompting viable for me. Environment is not a side detail; it is the primary constraint.

Practical caveats

Voice prompting is not automatically faster overall if your review process is weak.

You can lose all time gains if:

  • prompts are vague,
  • transcription is noisy,
  • or agent output needs heavy correction.

My guardrail is to treat voice as a fast input channel, not an autopilot channel.

Where I am now

I type less than before, especially for prompt drafting, planning, and exploratory coding tasks.

I still keep a manual review before execution, and I think that is the right trade-off for now. If reliability keeps improving, maybe I will trust auto-send later. Just not today.

References

all posts →