Quick start
Hold Right Option. Speak. Release. Your words land in whatever app has focus — a doc, a Slack message, a search box, anything that accepts text.
Apa is always running in your menu bar; there's no "Open the app" step. The Speechmatics mark in the menu bar shows what state it's in: jump to icons →
Start a session
Right Option has two behaviours, distinguished by how long you hold it.
| What you do | What happens |
|---|---|
| Hold (longer than ~300 ms) | Push-to-talk. Recording runs while the key is down, ends when you release. |
| Tap (press and release quickly) | Hands-free mode. Recording stays on until you tap again to end. |
Hands-free is useful for long dictation — meeting notes, essay drafts, voicemail-style replies. Tap once to start, speak as long as you like, tap once to finish.
Modes
Speech mode default
Apa adds punctuation automatically and removes filler words like "um", "uh", "ah". Casual writing — emails, docs, Slack, chat.
Legal mode
You dictate every comma and full stop out loud. Automatic punctuation is suppressed; spoken punctuation phrases are substituted in. Designed for legal and medical professionals who learnt dictation that way.
Switch modes by saying Apa legal mode / Apa speech mode, or pick from the tray's Mode submenu. The current mode persists across restarts.
Voice commands
Some commands aren't text — they change how Apa behaves. Trigger them by saying Apa at the start of a phrase. The word "apa" itself never gets typed.
| Say this | Effect |
|---|---|
Apa legal mode | Switch to legal-dictation mode (spoken punctuation). |
Apa speech mode | Switch back to normal speech mode. |
Apa caps lock on | EVERYTHING THAT FOLLOWS IS UPPERCASE until you turn it off. |
Apa caps lock off | Stop uppercasing. Normal case resumes. |
Apa polish [instruction] | After your dictation, send the typed text to an LLM for cleanup. More below → |
Say it however feels natural — "appa", "aha", "ah-pa" all work.
Emoji
In Speech mode, say any emoji name followed by the word emoji and the actual character is inserted.
"fire emoji" → 🔥 "clapping hands emoji" → 👏 "rocket emoji" → 🚀 "face with tears of joy emoji" → 😂 "thinking face emoji" → 🤔 "heart emoji" → ❤️ The phrase table is the full Unicode CLDR emoji set (~1,900 entries) plus common natural-speech aliases ("crying laughing" → 😂, "flame" → 🔥). Longer phrases are tried first, so "face with tears of joy emoji" matches before "tears of joy emoji" would.
Disabled in Legal mode — you're dictating punctuation aloud, not emoji.
Legal mode
In Legal mode, ASR-inserted punctuation is stripped first, then these phrases substitute for actual symbols.
Sentence enders
full stop | . |
question mark | ? |
exclamation mark | ! |
Mid-sentence punctuation
comma | , |
colon | : |
semicolon | ; |
hyphen | - |
dash | — |
ellipsis | … |
Brackets and quotes
open bracket / open parenthesis | ( |
close bracket / close parenthesis | ) |
open curly bracket | { |
close curly bracket | } |
open square bracket | [ |
close square bracket | ] |
open quote | “ |
close quote | ” |
Structure
new line | line break |
new paragraph | blank line + new paragraph |
tab | tab character |
Symbols
forward slash | / |
back slash / backslash | \ |
at sign | @ |
hash | # |
asterisk / star | * |
ampersand | & |
underscore | _ |
tilde | ~ |
caret | ^ |
plus sign | + |
equals sign | = |
percent | % |
dollar sign | $ |
pound sign | £ |
euro sign | € |
Set phrases
A small number of fixed legal idioms render in uppercase automatically:
without prejudice | WITHOUT PREJUDICE |
per se | PER SE |
Polish
Say Apa polish, optionally with an instruction, then end the session (release Right Option if you're holding, or tap it again if you're in hands-free). The text you just dictated is sent to an LLM and the result replaces it in place.
Polish is the last thing you say. Audio after the command is ignored; the LLM call fires when the session ends.
Apa polish— fix homophones, capitalisation, dictation artefacts; preserve voice.Apa polish make this more formalApa polish break into bullet pointsApa polish tighten this for an investor email
How it replaces text
Two paths, picked automatically per app:
- Accessibility API — atomic, in-place replacement. Used for apps that expose their text fields (most native apps, Chrome / Safari, Slack, modern Electron).
- Keyboard fallback — Apa selects the dictated text with Shift+← and re-types the replacement. Used for everything else, including terminals.
The keyboard fallback assumes the cursor stayed where Apa left it. If you clicked into a different field mid-session, Polish will land on the wrong text. The accessibility path doesn't have this constraint.
Setup
Polish is off by default. Enable it in ~/.config/dictate/config.json:
{
"polish_enabled": true,
"polish_api_key": "sk-…",
"polish_model": ""
} The API key can also come from the OPENAI_API_KEY environment variable. polish_model defaults to gpt-4o.
Status icons
The Speechmatics mark in your menu bar tells you what state Apa is in:
| Icon | State | What's happening |
|---|---|---|
| Idle | Signed in, permissions good, waiting for your hotkey. | |
| Recording | Mic is on, audio is being streamed to the ASR. | |
| Transcribing | Mic is off; the final transcript is being processed and typed. | |
| Polishing | An LLM rewrite is in progress. | |
| Signed out / session expired | Click the menu bar to sign in. | |
| Permissions needed | macOS Accessibility or Microphone access is missing — click the tray for one-click links. |
Permissions
Apa needs two macOS permissions before it can do anything. Both are granted in System Settings → Privacy & Security.
Accessibility
One permission covers two things: detecting your push-to-talk hotkey globally (via CGEventTap) and typing transcribed text into the focused app (via CGEventPost). macOS treats Accessibility as a superset of the more granular "Input Monitoring" permission, so a single toggle is enough.
Find Apa under Privacy & Security → Accessibility and switch it on.
Microphone
Required to capture audio for the ASR. The first time you start a session, macOS shows its standard mic-access prompt; subsequent sessions don't re-prompt. The grant is remembered per app per signature, so it survives upgrades as long as the build signing stays consistent.
Find Apa under Privacy & Security → Microphone.
While you're granting them
If either is missing on launch, the tray icon shows 🔒 and the menu adds clickable links straight to the relevant Settings panel. As soon as you flip a toggle on, the icon updates — no restart, no re-launch. If both are off, Apa opens both Settings panels.
If you revoke them
Toggling either permission off while Apa is running doesn't crash anything, but the next hotkey press will fail silently (hotkey events stop reaching the app without Accessibility; the mic refuses to open without Microphone). The tray flips back to 🔒 on the next permission check.
Troubleshooting
I pressed the hotkey and nothing happened.
Check the menu-bar icon. If it's 🔒 you're missing Accessibility access (open the tray to fix). If it's ○ you're signed out. If it's ⚠ your session expired — click sign-in. If everything looks fine but nothing types, see whether the focused window accepts text input — Apa types into whatever's focused, so a non-text window (Finder, an empty desktop click) will swallow the input.
Words appear and then get backspaced — is that a bug?
No, that's predictive output working as designed. The ASR sends a provisional guess in <200 ms, then a more accurate final transcript a moment later. Apa corrects in place so you get the speed of the guess and the accuracy of the final.
"Apa" keeps getting typed as plain text.
The wakeword must be the first word of the phrase. Mid-sentence "apa" is treated as literal text. Also: try saying it more deliberately — sound-alikes "appa", "aha", "ah-pa" are all accepted by the model.
Emoji insertion didn't trigger.
Three causes, in order of likelihood: (1) you're in Legal mode (switch with Apa speech mode); (2) you didn't say the word "emoji" at the end of the phrase; (3) the name isn't in the table — try a more common synonym.
Caps lock won't turn off.
Say Apa caps lock off at the start of a fresh phrase. If the ASR mis-hears it, the simplest fix is restarting a session — caps-lock state resets at the start of every session.