Accessible Markup for a Speech Input Button

Accessible Markup for a Speech Input Button

~ 5 min read

Why this matters

A button that starts speech recognition looks simple, but it sits at the intersection of:

  • Keyboard interaction
  • Screen readers
  • Voice control software
  • Browser speech APIs that frequently fail or change state

Small markup mistakes here can make the feature unusable for entire groups of users.

This article shows a practical, standards-aligned pattern that gives you strong accessibility coverage without unnecessary ARIA.


1. Always use a real <button>

Start with native HTML. Avoid clickable <div> or <span> elements.

<button id="speech-btn" type="button">Start voice input</button>

Why this works out of the box:

  • Keyboard support (Enter / Space)
  • Correct semantics for assistive tech
  • Built-in focus handling

No ARIA is required at this stage.


2. Expose listening state clearly

Speech input is stateful. Users must know when the app is listening.

The most reliable pattern is:

  • A toggle button
  • Visible label changes
  • aria-pressed to expose state
  • A live region to announce transitions

Markup

<button
    id="speech-btn"
    type="button"
    aria-pressed="false"
    aria-describedby="speech-status"
>
    Start voice input
</button>

<span id="speech-status" class="sr-only" aria-live="polite"></span>

JavaScript

function setListening(isListening) {
    const btn = document.getElementById("speech-btn");
    const status = document.getElementById("speech-status");

    btn.setAttribute("aria-pressed", String(isListening));
    btn.textContent = isListening ? "Stop voice input" : "Start voice input";

    status.textContent = isListening
        ? "Voice input is active"
        : "Voice input stopped";
}

What this achieves:

  • Screen readers announce state changes
  • Sighted users see a clear label change
  • Voice-control users can say “Click Stop voice input”

3. Icon-only buttons need an accessible name

This is not accessible on its own:

<button>
    <svg>🎤</svg>
</button>

Instead, provide an explicit label:

<button type="button" aria-label="Start voice input">
    <svg aria-hidden="true">🎤</svg>
</button>

If the button toggles, update the label dynamically:

btn.setAttribute(
    "aria-label",
    isListening ? "Stop voice input" : "Start voice input",
);

Never rely on icon shape or colour to convey state.


4. Don’t hijack focus while listening

Avoid patterns that:

  • Automatically move focus
  • Trap keyboard input
  • Require special key combinations to stop listening

Best practice:

  • The same button starts and stops listening
  • Optionally allow Escape as a shortcut (but don’t rely on it)
document.addEventListener("keydown", (e) => {
    if (e.key === "Escape" && isListening) {
        stopListening();
    }
});

Keyboard users should always have a clear exit.


5. Announce errors and permission failures

Speech APIs fail frequently due to:

  • Microphone permissions
  • Missing hardware
  • Network issues

Use a dedicated alert region for genuine errors:

<div id="speech-error" role="alert" hidden></div>
function showError(message) {
    const el = document.getElementById("speech-error");
    el.hidden = false;
    el.textContent = message;
}

Use role="alert" sparingly — only for failures that require user action.


6. The 80/20 accessibility checklist

If you do only the following, you’ll already be in a strong position:

  • Use a native <button>
  • Change visible text when listening starts or stops
  • Keep aria-pressed in sync with state
  • Announce state changes via a polite live region
  • Ensure everything works with keyboard alone

This covers the majority of real-world accessibility needs.


7. ARIA pitfalls to avoid

Common mistakes:

  • Adding role="button" to a <button>
  • Using aria-live="assertive" for normal state changes
  • Creating custom keyboard interactions when native ones exist

Native HTML first. ARIA second.


Final thoughts

Speech input is increasingly common in:

  • AI chat interfaces
  • Accessibility tooling
  • Multimodal web apps

Getting the button right sets the tone for the entire interaction. A small amount of correct markup goes a long way.

If you’re building an AI-powered chat interface, consider extending this pattern to include:

  • Accessible transcript areas
  • Clear ownership between chat content and page content
  • Explicit relationships between controls and generated text
  • Proper handling of focus and keyboard navigation for generated content

Extra: Creating a Vue button component to accept speech input

The following implements a button which uses the browser Web Speech API to listen and then emits a result event with the text translation of the speech. The component is implemented in Vue with tailwind css.

Usage:

<VoiceToTextButton @result="(text:string) => { /* consume text */ }" />

To get TypeScript type resolution, you’ll need to install as a dev dependancy @types/dom-speech-recognition and then create the following type definitions file types/speech-recognition.d.ts.

/// <reference types="dom-speech-recognition" />
export {};

declare global {
    interface Window {
        SpeechRecognition?: typeof SpeechRecognition;
        webkitSpeechRecognition?: typeof SpeechRecognition;
    }
}

Finally add the component components/VoiceToTextButton.vue

<script setup lang="ts">
const emit = defineEmits<{
    (e: "result", text: string): void;
}>();
const voiceListenRef = useTemplateRef<HTMLButtonElement>("voiceListen");
const isListening = ref(false);

const stopListening = (recognition: SpeechRecognition) => {
    voiceListenRef.value?.removeAttribute("disabled");
    recognition.stop();
    isListening.value = false;
};

const abortListening = (recognition: SpeechRecognition) => {
    recognition.abort();
    isListening.value = false;
};

onMounted(() => {
    const SpeechRecognitionCtor =
        window.SpeechRecognition ?? window.webkitSpeechRecognition;

    if (!SpeechRecognitionCtor) {
        console.warn("SpeechRecognition is not supported in this browser.");
        return;
    }

    const recognition = new SpeechRecognitionCtor();

    recognition.addEventListener("result", (event: SpeechRecognitionEvent) => {
        voiceListenRef.value?.removeAttribute("disabled");
        const transcript = event.results?.[0]?.[0]?.transcript;
        if (transcript) {
            emit("result", transcript);
        }
    });

    recognition.addEventListener("error", (e: SpeechRecognitionErrorEvent) => {
        console.error("voice error", e);
        stopListening(recognition);
    });

    recognition.addEventListener("speechend", () => {
        stopListening(recognition);
    });

    const handleVisibilityChange = () => {
        if (document.hidden) {
            abortListening(recognition);
        }
    };

    const handleAbort = () => {
        abortListening(recognition);
    };

    document.addEventListener("visibilitychange", handleVisibilityChange);
    document.addEventListener("beforeunload", handleAbort);
    document.addEventListener("pagehide", () => stopListening(recognition));

    onUnmounted(() => {
        document.removeEventListener(
            "visibilitychange",
            handleVisibilityChange,
        );
        document.removeEventListener("beforeunload", handleAbort);
        recognition.abort();
    });

    voiceListenRef.value?.addEventListener("click", () => {
        voiceListenRef.value?.setAttribute("disabled", "true");
        recognition.start();
        isListening.value = true;
    });
});
</script>

<template>
    <button
        ref="voiceListen"
        data-voice-listen
        type="button"
        :aria-pressed="isListening"
        aria-describedby="speech-status"
        aria-label="Start voice input"
        class="mr-2 cursor-pointer rounded-full bg-orange-500 px-3 py-1.5 text-white shadow-sm hover:bg-orange-600 disabled:animate-pulse disabled:cursor-not-allowed disabled:bg-orange-700 disabled:shadow-none"
    >
        <svg
            xmlns="http://www.w3.org/2000/svg"
            viewBox="0 0 24 24"
            stroke-width="1.5"
            stroke="currentColor"
            class="-mt-0.5 inline-block h-5 w-5 fill-none"
        >
            <path
                stroke-linecap="round"
                stroke-linejoin="round"
                d="M12 18.75a6 6 0 0 0 6-6v-1.5m-6 7.5a6 6 0 0 1-6-6v-1.5m6 7.5v3.75m-3.75 0h7.5M12 15.75a3 3 0 0 1-3-3V4.5a3 3 0 1 1 6 0v8.25a3 3 0 0 1-3 3Z"
            />
        </svg>
    </button>
    <span id="speech-status" class="sr-only" aria-live="polite">{{
        isListening ? "Voice input is active" : "Voice input stopped"
    }}</span>
</template>

all posts →