Why this matters
A button that starts speech recognition looks simple, but it sits at the intersection of:
- Keyboard interaction
- Screen readers
- Voice control software
- Browser speech APIs that frequently fail or change state
Small markup mistakes here can make the feature unusable for entire groups of users.
This article shows a practical, standards-aligned pattern that gives you strong accessibility coverage without unnecessary ARIA.
1. Always use a real <button>
Start with native HTML. Avoid clickable <div> or <span> elements.
<button id="speech-btn" type="button">Start voice input</button>
Why this works out of the box:
- Keyboard support (Enter / Space)
- Correct semantics for assistive tech
- Built-in focus handling
No ARIA is required at this stage.
2. Expose listening state clearly
Speech input is stateful. Users must know when the app is listening.
The most reliable pattern is:
- A toggle button
- Visible label changes
aria-pressedto expose state- A live region to announce transitions
Markup
<button
id="speech-btn"
type="button"
aria-pressed="false"
aria-describedby="speech-status"
>
Start voice input
</button>
<span id="speech-status" class="sr-only" aria-live="polite"></span>
JavaScript
function setListening(isListening) {
const btn = document.getElementById("speech-btn");
const status = document.getElementById("speech-status");
btn.setAttribute("aria-pressed", String(isListening));
btn.textContent = isListening ? "Stop voice input" : "Start voice input";
status.textContent = isListening
? "Voice input is active"
: "Voice input stopped";
}
What this achieves:
- Screen readers announce state changes
- Sighted users see a clear label change
- Voice-control users can say “Click Stop voice input”
3. Icon-only buttons need an accessible name
This is not accessible on its own:
<button>
<svg>🎤</svg>
</button>
Instead, provide an explicit label:
<button type="button" aria-label="Start voice input">
<svg aria-hidden="true">🎤</svg>
</button>
If the button toggles, update the label dynamically:
btn.setAttribute(
"aria-label",
isListening ? "Stop voice input" : "Start voice input",
);
Never rely on icon shape or colour to convey state.
4. Don’t hijack focus while listening
Avoid patterns that:
- Automatically move focus
- Trap keyboard input
- Require special key combinations to stop listening
Best practice:
- The same button starts and stops listening
- Optionally allow
Escapeas a shortcut (but don’t rely on it)
document.addEventListener("keydown", (e) => {
if (e.key === "Escape" && isListening) {
stopListening();
}
});
Keyboard users should always have a clear exit.
5. Announce errors and permission failures
Speech APIs fail frequently due to:
- Microphone permissions
- Missing hardware
- Network issues
Use a dedicated alert region for genuine errors:
<div id="speech-error" role="alert" hidden></div>
function showError(message) {
const el = document.getElementById("speech-error");
el.hidden = false;
el.textContent = message;
}
Use role="alert" sparingly — only for failures that require user action.
6. The 80/20 accessibility checklist
If you do only the following, you’ll already be in a strong position:
- Use a native
<button> - Change visible text when listening starts or stops
- Keep
aria-pressedin sync with state - Announce state changes via a polite live region
- Ensure everything works with keyboard alone
This covers the majority of real-world accessibility needs.
7. ARIA pitfalls to avoid
Common mistakes:
- Adding
role="button"to a<button> - Using
aria-live="assertive"for normal state changes - Creating custom keyboard interactions when native ones exist
Native HTML first. ARIA second.
Final thoughts
Speech input is increasingly common in:
- AI chat interfaces
- Accessibility tooling
- Multimodal web apps
Getting the button right sets the tone for the entire interaction. A small amount of correct markup goes a long way.
If you’re building an AI-powered chat interface, consider extending this pattern to include:
- Accessible transcript areas
- Clear ownership between chat content and page content
- Explicit relationships between controls and generated text
- Proper handling of focus and keyboard navigation for generated content
Extra: Creating a Vue button component to accept speech input
The following implements a button which uses the browser
Web Speech API
to listen and then emits a result event with the text translation of the speech.
The component is implemented in Vue with tailwind css.
Usage:
<VoiceToTextButton @result="(text:string) => { /* consume text */ }" />
To get TypeScript type resolution, you’ll need to install as a dev dependancy @types/dom-speech-recognition and
then create the following type definitions file types/speech-recognition.d.ts.
/// <reference types="dom-speech-recognition" />
export {};
declare global {
interface Window {
SpeechRecognition?: typeof SpeechRecognition;
webkitSpeechRecognition?: typeof SpeechRecognition;
}
}
Finally add the component components/VoiceToTextButton.vue
<script setup lang="ts">
const emit = defineEmits<{
(e: "result", text: string): void;
}>();
const voiceListenRef = useTemplateRef<HTMLButtonElement>("voiceListen");
const isListening = ref(false);
const stopListening = (recognition: SpeechRecognition) => {
voiceListenRef.value?.removeAttribute("disabled");
recognition.stop();
isListening.value = false;
};
const abortListening = (recognition: SpeechRecognition) => {
recognition.abort();
isListening.value = false;
};
onMounted(() => {
const SpeechRecognitionCtor =
window.SpeechRecognition ?? window.webkitSpeechRecognition;
if (!SpeechRecognitionCtor) {
console.warn("SpeechRecognition is not supported in this browser.");
return;
}
const recognition = new SpeechRecognitionCtor();
recognition.addEventListener("result", (event: SpeechRecognitionEvent) => {
voiceListenRef.value?.removeAttribute("disabled");
const transcript = event.results?.[0]?.[0]?.transcript;
if (transcript) {
emit("result", transcript);
}
});
recognition.addEventListener("error", (e: SpeechRecognitionErrorEvent) => {
console.error("voice error", e);
stopListening(recognition);
});
recognition.addEventListener("speechend", () => {
stopListening(recognition);
});
const handleVisibilityChange = () => {
if (document.hidden) {
abortListening(recognition);
}
};
const handleAbort = () => {
abortListening(recognition);
};
document.addEventListener("visibilitychange", handleVisibilityChange);
document.addEventListener("beforeunload", handleAbort);
document.addEventListener("pagehide", () => stopListening(recognition));
onUnmounted(() => {
document.removeEventListener(
"visibilitychange",
handleVisibilityChange,
);
document.removeEventListener("beforeunload", handleAbort);
recognition.abort();
});
voiceListenRef.value?.addEventListener("click", () => {
voiceListenRef.value?.setAttribute("disabled", "true");
recognition.start();
isListening.value = true;
});
});
</script>
<template>
<button
ref="voiceListen"
data-voice-listen
type="button"
:aria-pressed="isListening"
aria-describedby="speech-status"
aria-label="Start voice input"
class="mr-2 cursor-pointer rounded-full bg-orange-500 px-3 py-1.5 text-white shadow-sm hover:bg-orange-600 disabled:animate-pulse disabled:cursor-not-allowed disabled:bg-orange-700 disabled:shadow-none"
>
<svg
xmlns="http://www.w3.org/2000/svg"
viewBox="0 0 24 24"
stroke-width="1.5"
stroke="currentColor"
class="-mt-0.5 inline-block h-5 w-5 fill-none"
>
<path
stroke-linecap="round"
stroke-linejoin="round"
d="M12 18.75a6 6 0 0 0 6-6v-1.5m-6 7.5a6 6 0 0 1-6-6v-1.5m6 7.5v3.75m-3.75 0h7.5M12 15.75a3 3 0 0 1-3-3V4.5a3 3 0 1 1 6 0v8.25a3 3 0 0 1-3 3Z"
/>
</svg>
</button>
<span id="speech-status" class="sr-only" aria-live="polite">{{
isListening ? "Voice input is active" : "Voice input stopped"
}}</span>
</template>