Documentation · v0.1.0

Speak. Done.

Every command, every phrase, every state.

Quick start

Hold Right Option. Speak. Release. Your words land in whatever app has focus — a doc, a Slack message, a search box, anything that accepts text.

Apa is always running in your menu bar; there's no "Open the app" step. The Speechmatics mark in the menu bar shows what state it's in: jump to icons →

Start a session

Right Option has two behaviours, distinguished by how long you hold it.

What you doWhat happens
Hold (longer than ~300 ms) Push-to-talk. Recording runs while the key is down, ends when you release.
Tap (press and release quickly) Hands-free mode. Recording stays on until you tap again to end.

Hands-free is useful for long dictation — meeting notes, essay drafts, voicemail-style replies. Tap once to start, speak as long as you like, tap once to finish.

Modes

💬

Speech mode default

Apa adds punctuation automatically and removes filler words like "um", "uh", "ah". Casual writing — emails, docs, Slack, chat.

⚖︎

Legal mode

You dictate every comma and full stop out loud. Automatic punctuation is suppressed; spoken punctuation phrases are substituted in. Designed for legal and medical professionals who learnt dictation that way.

Switch modes by saying Apa legal mode / Apa speech mode, or pick from the tray's Mode submenu. The current mode persists across restarts.

Voice commands

Some commands aren't text — they change how Apa behaves. Trigger them by saying Apa at the start of a phrase. The word "apa" itself never gets typed.

Say thisEffect
Apa legal modeSwitch to legal-dictation mode (spoken punctuation).
Apa speech modeSwitch back to normal speech mode.
Apa caps lock onEVERYTHING THAT FOLLOWS IS UPPERCASE until you turn it off.
Apa caps lock offStop uppercasing. Normal case resumes.
Apa polish [instruction]After your dictation, send the typed text to an LLM for cleanup. More below →

Say it however feels natural — "appa", "aha", "ah-pa" all work.

Emoji

In Speech mode, say any emoji name followed by the word emoji and the actual character is inserted.

"fire emoji" 🔥
"clapping hands emoji" 👏
"rocket emoji" 🚀
"face with tears of joy emoji" 😂
"thinking face emoji" 🤔
"heart emoji" ❤️

The phrase table is the full Unicode CLDR emoji set (~1,900 entries) plus common natural-speech aliases ("crying laughing" → 😂, "flame" → 🔥). Longer phrases are tried first, so "face with tears of joy emoji" matches before "tears of joy emoji" would.

Disabled in Legal mode — you're dictating punctuation aloud, not emoji.

Polish

Say Apa polish, optionally with an instruction, then end the session (release Right Option if you're holding, or tap it again if you're in hands-free). The text you just dictated is sent to an LLM and the result replaces it in place.

Polish is the last thing you say. Audio after the command is ignored; the LLM call fires when the session ends.

Examples
  • Apa polish — fix homophones, capitalisation, dictation artefacts; preserve voice.
  • Apa polish make this more formal
  • Apa polish break into bullet points
  • Apa polish tighten this for an investor email

How it replaces text

Two paths, picked automatically per app:

  • Accessibility API — atomic, in-place replacement. Used for apps that expose their text fields (most native apps, Chrome / Safari, Slack, modern Electron).
  • Keyboard fallback — Apa selects the dictated text with Shift+← and re-types the replacement. Used for everything else, including terminals.

The keyboard fallback assumes the cursor stayed where Apa left it. If you clicked into a different field mid-session, Polish will land on the wrong text. The accessibility path doesn't have this constraint.

Setup

Polish is off by default. Enable it in ~/.config/dictate/config.json:

{
  "polish_enabled": true,
  "polish_api_key": "sk-…",
  "polish_model": ""
}

The API key can also come from the OPENAI_API_KEY environment variable. polish_model defaults to gpt-4o.

Status icons

The Speechmatics mark in your menu bar tells you what state Apa is in:

IconStateWhat's happening
IdleSigned in, permissions good, waiting for your hotkey.
RecordingMic is on, audio is being streamed to the ASR.
TranscribingMic is off; the final transcript is being processed and typed.
PolishingAn LLM rewrite is in progress.
Signed out / session expiredClick the menu bar to sign in.
Permissions neededmacOS Accessibility or Microphone access is missing — click the tray for one-click links.

Permissions

Apa needs two macOS permissions before it can do anything. Both are granted in System Settings → Privacy & Security.

Accessibility

One permission covers two things: detecting your push-to-talk hotkey globally (via CGEventTap) and typing transcribed text into the focused app (via CGEventPost). macOS treats Accessibility as a superset of the more granular "Input Monitoring" permission, so a single toggle is enough.

Find Apa under Privacy & Security → Accessibility and switch it on.

Microphone

Required to capture audio for the ASR. The first time you start a session, macOS shows its standard mic-access prompt; subsequent sessions don't re-prompt. The grant is remembered per app per signature, so it survives upgrades as long as the build signing stays consistent.

Find Apa under Privacy & Security → Microphone.

While you're granting them

If either is missing on launch, the tray icon shows 🔒 and the menu adds clickable links straight to the relevant Settings panel. As soon as you flip a toggle on, the icon updates — no restart, no re-launch. If both are off, Apa opens both Settings panels.

If you revoke them

Toggling either permission off while Apa is running doesn't crash anything, but the next hotkey press will fail silently (hotkey events stop reaching the app without Accessibility; the mic refuses to open without Microphone). The tray flips back to 🔒 on the next permission check.

Troubleshooting

I pressed the hotkey and nothing happened.

Check the menu-bar icon. If it's 🔒 you're missing Accessibility access (open the tray to fix). If it's ○ you're signed out. If it's ⚠ your session expired — click sign-in. If everything looks fine but nothing types, see whether the focused window accepts text input — Apa types into whatever's focused, so a non-text window (Finder, an empty desktop click) will swallow the input.

Words appear and then get backspaced — is that a bug?

No, that's predictive output working as designed. The ASR sends a provisional guess in <200 ms, then a more accurate final transcript a moment later. Apa corrects in place so you get the speed of the guess and the accuracy of the final.

"Apa" keeps getting typed as plain text.

The wakeword must be the first word of the phrase. Mid-sentence "apa" is treated as literal text. Also: try saying it more deliberately — sound-alikes "appa", "aha", "ah-pa" are all accepted by the model.

Emoji insertion didn't trigger.

Three causes, in order of likelihood: (1) you're in Legal mode (switch with Apa speech mode); (2) you didn't say the word "emoji" at the end of the phrase; (3) the name isn't in the table — try a more common synonym.

Caps lock won't turn off.

Say Apa caps lock off at the start of a fresh phrase. If the ASR mis-hears it, the simplest fix is restarting a session — caps-lock state resets at the start of every session.