A Voice Coding Assistant
You Can Actually Talk To
AIDEN's voice orchestrator — “Talk to AIDEN” — turns its agentic IDE into a voice-controlled AI IDE. Hold a key, describe what you want built, and AIDEN spawns a background coding agent to do it — then speaks the result back to you. It's a voice-controlled coding agent for developers who would rather delegate work by talking than by typing. No wake word, no always-on mic: pure push-to-talk, powered by OpenAI's Realtime API, running on your own keys and your own machine.
In this guide
What Is the Voice Orchestrator?
Most “voice” features in developer tools are dictation in disguise — you speak, it types the words into a box, and you go back to the keyboard. AIDEN's voice orchestrator is different. It is a real voice AI coding agent wired into an agentic IDE: when you talk to it, it can either act instantly inside the app or hand the work to a background agent that actually writes the code.
You open it with Cmd/Ctrl+Shift+V, hold Space to talk, and release to send. AIDEN replies in a natural, low-latency voice over WebRTC — powered by OpenAI's Realtime API (model gpt-realtime-2, with selectable voices like marin, cedar and alloy). Both your words and AIDEN's reply appear as live streaming captions, so you always have a transcript of the current session on screen. Change your mind mid-answer? Hold Space to barge in and start a new turn, or hit Esc to hard-stop a response.
This is the missing input mode for the agentic IDE. You already delegate features to AIDEN's agents on a kanban board; the voice orchestrator lets you kick those off — and steer them — with your voice.
How It Works: Voice to Code
Under the hood, AIDEN decides whether your request is a quick in-app action or real work that needs an agent. Here is the full path from spoken words to a delegated result.
“Hold Space, say what you want, release. Quick actions happen instantly; real coding gets delegated to a background agent — and when it's done, AIDEN reads the result back to you.”
Push-to-Talk
Toggle voice mode with Cmd/Ctrl+Shift+V, then hold Space and speak. Release to send your turn. There is no wake word and no always-on listening — activation is a keypress, so the mic is only live while you're holding the key. Your speech is transcribed live as streaming captions.
Fast Tools or Delegate
For lightweight things — navigating the app, splitting or opening panels, a quick read — the voice model calls one of about three dozen curated fast tools directly and the action happens in-app immediately. For anything heavier (coding, research, file edits, browsing, automation) it calls delegate_task instead.
Background Agent Spawns
delegate_task spawns a background agent and routes the work to the best worker: Codex CLI for coding, Claude CLI for general tasks, with the OpenAI API as a fallback. Today the voice orchestrator runs one delegated agent at a time — separate from the many parallel agents you can run on AIDEN's kanban board.
Approve Risky Actions
Destructive or risky actions — deleting, sending, deploying, calendar or account changes — are auto-flagged and paused. You see Approve and Decline buttons in the voice HUD and nothing runs until you say yes. Everything else proceeds without interruption.
Monitor + Hear the Result
A top-bar HUD shows the running agent count and a live narration of the current step and tool, moving through queued → running → completed or failed. When the delegated task finishes, AIDEN speaks the result summary aloud — so you can keep your hands off the keyboard while it works.
The result is a genuine voice-to-code loop: you describe intent out loud, an agent turns it into real changes on your machine, and you stay in control of anything risky. It pairs naturally with parallel agents on git branches and spec-driven development.
What You Can Do By Voice
The voice orchestrator is built around a few honest, verified capabilities. Here is exactly what it does today.
Push-to-Talk Voice Mode
Toggle with Cmd/Ctrl+Shift+V, hold Space to talk, release to send. Natural, low-latency spoken replies over WebRTC using OpenAI's Realtime API, with selectable voices. Esc hard-stops any response.
Live Transcription
Both your speech and AIDEN's reply appear as streaming captions in real time, so every voice session has an on-screen transcript you can read as it happens.
Barge-In Interruption
Hold Space mid-reply to interrupt AIDEN and start a new turn instantly. No waiting for it to finish talking before you can redirect.
Instant In-App Actions
The voice model can call about three dozen curated fast tools directly — navigate the app, split or open panels, do quick reads — so lightweight actions happen the moment you ask.
Delegate to a Background Agent
Heavier work (coding, research, file edits, browsing, automation) is handed to a background agent via delegate_task, routed to Codex CLI, Claude CLI, or the OpenAI API depending on the job.
Approval Gates for Risky Actions
Deleting, sending, deploying, or changing calendar/account settings is auto-flagged. You Approve or Decline in the voice HUD before it runs — nothing destructive happens without your explicit yes.
Live Agent HUD
A top-bar HUD shows the running agent count and narrates the current step and tool, moving through queued → running → completed or failed so you always know what's happening.
Spoken Result Summaries
When a delegated task finishes, AIDEN reads the result summary aloud — so you can delegate work, look away from the keyboard, and hear when it's done.
Honest limits (so you know what to expect)
- Push-to-talk only. No wake word, no always-on listening — you hold Space to activate the mic.
- One delegated agent at a time. The voice delegation slot is single-agent today. AIDEN's kanban board runs many agents in parallel, but voice runs one delegated task at a time.
- In-memory sessions. Voice sessions live in memory only — there's no saved voice transcript history across app restarts.
- Requirements. macOS 12 or later, at least one of Claude Code or Codex CLI installed, plus an OpenAI key for the Realtime voice layer.
Voice vs Typing-Only Assistants
Typing-only AI coding assistants are excellent, and AIDEN is one of them by default — you can drive everything from the keyboard and the kanban board. The voice orchestrator simply adds a second input mode for the moments when talking is faster than typing: kicking off a task while your hands are busy, steering an agent mid-run, or thinking out loud about what to build next.
| Dimension | Typing-only assistant | AIDEN voice orchestrator |
|---|---|---|
| Input mode | Keyboard / chat box | Keyboard + push-to-talk voice |
| Kicking off work | Type out the request | Say it, hold Space, release |
| Getting results | Read the output | Read captions or hear it spoken aloud |
| Steering mid-run | Type a new message | Barge in by holding Space |
| Safety on risky actions | Depends on the tool | Approve / Decline gate in the HUD |
| Where the work runs | Your machine / your keys | Your machine / your keys |
The point isn't that voice replaces typing — it's that an AI pair programmer you can talk to removes friction at the exact moments the keyboard gets in the way. You still review every diff, still approve anything risky, and your code still never leaves your machine.
Voice Orchestrator — FAQ
What is AIDEN's voice orchestrator?
Is it hands-free or always listening?
How does voice actually run my coding tasks?
Can it delete files or deploy without my permission?
How many agents can voice run at once?
What do I need to use voice, and does my code stay private?
Related Guides
What Is an Agentic IDE?
The multi-agent development model voice plugs into
Engineering with AI Agents
Patterns and workflows for agentic development
Parallel Agents & Git Worktrees
Run multiple agents on isolated branches simultaneously
Spec-Driven AI Development
Why specs beat prompts for agentic workflows
Claude Code Orchestration
GUI and workspace layer on top of Claude Code CLI
AI Kanban for Developers
Manage agent stories on a visual board
Talk to your code — for free
Download AIDEN, hold Space, and delegate your first coding task by voice. Free tier — one project, unlimited agents, no credit card.
macOS 12+ · Requires Claude Code or Codex CLI · Voice uses OpenAI Realtime (OpenAI key)