New in AIDEN

A Voice Coding Assistant
You Can Actually Talk To

By Kylian Migot · July 5, 2026 · 8 min read

AIDEN's voice orchestrator — “Talk to AIDEN” — turns its agentic IDE into a voice-controlled AI IDE. Hold a key, describe what you want built, and AIDEN spawns a background coding agent to do it — then speaks the result back to you. It's a voice-controlled coding agent for developers who would rather delegate work by talking than by typing. No wake word, no always-on mic: pure push-to-talk, powered by OpenAI's Realtime API, running on your own keys and your own machine.

In this guide

1. What Is the Voice Orchestrator?
2. How It Works: Voice to Code
3. What You Can Do By Voice
4. Voice vs Typing-Only Assistants
5. FAQ

What Is the Voice Orchestrator?

Most “voice” features in developer tools are dictation in disguise — you speak, it types the words into a box, and you go back to the keyboard. AIDEN's voice orchestrator is different. It is a real voice AI coding agent wired into an agentic IDE: when you talk to it, it can either act instantly inside the app or hand the work to a background agent that actually writes the code.

You open it with Cmd/Ctrl+Shift+V, hold Space to talk, and release to send. AIDEN replies in a natural, low-latency voice over WebRTC — powered by OpenAI's Realtime API (model gpt-realtime-2, with selectable voices like marin, cedar and alloy). Both your words and AIDEN's reply appear as live streaming captions, so you always have a transcript of the current session on screen. Change your mind mid-answer? Hold Space to barge in and start a new turn, or hit Esc to hard-stop a response.

This is the missing input mode for the agentic IDE. You already delegate features to AIDEN's agents on a kanban board; the voice orchestrator lets you kick those off — and steer them — with your voice.

How It Works: Voice to Code

Under the hood, AIDEN decides whether your request is a quick in-app action or real work that needs an agent. Here is the full path from spoken words to a delegated result.

“Hold Space, say what you want, release. Quick actions happen instantly; real coding gets delegated to a background agent — and when it's done, AIDEN reads the result back to you.”

Push-to-Talk

Toggle voice mode with Cmd/Ctrl+Shift+V, then hold Space and speak. Release to send your turn. There is no wake word and no always-on listening — activation is a keypress, so the mic is only live while you're holding the key. Your speech is transcribed live as streaming captions.

Fast Tools or Delegate

For lightweight things — navigating the app, splitting or opening panels, a quick read — the voice model calls one of about three dozen curated fast tools directly and the action happens in-app immediately. For anything heavier (coding, research, file edits, browsing, automation) it calls delegate_task instead.

Background Agent Spawns

delegate_task spawns a background agent and routes the work to the best worker: Codex CLI for coding, Claude CLI for general tasks, with the OpenAI API as a fallback. Today the voice orchestrator runs one delegated agent at a time — separate from the many parallel agents you can run on AIDEN's kanban board.

Approve Risky Actions

Destructive or risky actions — deleting, sending, deploying, calendar or account changes — are auto-flagged and paused. You see Approve and Decline buttons in the voice HUD and nothing runs until you say yes. Everything else proceeds without interruption.

Monitor + Hear the Result

A top-bar HUD shows the running agent count and a live narration of the current step and tool, moving through queued → running → completed or failed. When the delegated task finishes, AIDEN speaks the result summary aloud — so you can keep your hands off the keyboard while it works.

The result is a genuine voice-to-code loop: you describe intent out loud, an agent turns it into real changes on your machine, and you stay in control of anything risky. It pairs naturally with parallel agents on git branches and spec-driven development.

What You Can Do By Voice

The voice orchestrator is built around a few honest, verified capabilities. Here is exactly what it does today.

Push-to-Talk Voice Mode

Toggle with Cmd/Ctrl+Shift+V, hold Space to talk, release to send. Natural, low-latency spoken replies over WebRTC using OpenAI's Realtime API, with selectable voices. Esc hard-stops any response.

Live Transcription

Both your speech and AIDEN's reply appear as streaming captions in real time, so every voice session has an on-screen transcript you can read as it happens.

Barge-In Interruption

Hold Space mid-reply to interrupt AIDEN and start a new turn instantly. No waiting for it to finish talking before you can redirect.

Instant In-App Actions

The voice model can call about three dozen curated fast tools directly — navigate the app, split or open panels, do quick reads — so lightweight actions happen the moment you ask.

Delegate to a Background Agent

Heavier work (coding, research, file edits, browsing, automation) is handed to a background agent via delegate_task, routed to Codex CLI, Claude CLI, or the OpenAI API depending on the job.

Approval Gates for Risky Actions

Deleting, sending, deploying, or changing calendar/account settings is auto-flagged. You Approve or Decline in the voice HUD before it runs — nothing destructive happens without your explicit yes.

Live Agent HUD

A top-bar HUD shows the running agent count and narrates the current step and tool, moving through queued → running → completed or failed so you always know what's happening.

Spoken Result Summaries

When a delegated task finishes, AIDEN reads the result summary aloud — so you can delegate work, look away from the keyboard, and hear when it's done.

Honest limits (so you know what to expect)

Push-to-talk only. No wake word, no always-on listening — you hold Space to activate the mic.
One delegated agent at a time. The voice delegation slot is single-agent today. AIDEN's kanban board runs many agents in parallel, but voice runs one delegated task at a time.
In-memory sessions. Voice sessions live in memory only — there's no saved voice transcript history across app restarts.
Requirements. macOS 12 or later, at least one of Claude Code or Codex CLI installed, plus an OpenAI key for the Realtime voice layer.

Voice vs Typing-Only Assistants

Typing-only AI coding assistants are excellent, and AIDEN is one of them by default — you can drive everything from the keyboard and the kanban board. The voice orchestrator simply adds a second input mode for the moments when talking is faster than typing: kicking off a task while your hands are busy, steering an agent mid-run, or thinking out loud about what to build next.

Dimension	Typing-only assistant	AIDEN voice orchestrator
Input mode	Keyboard / chat box	Keyboard + push-to-talk voice
Kicking off work	Type out the request	Say it, hold Space, release
Getting results	Read the output	Read captions or hear it spoken aloud
Steering mid-run	Type a new message	Barge in by holding Space
Safety on risky actions	Depends on the tool	Approve / Decline gate in the HUD
Where the work runs	Your machine / your keys	Your machine / your keys

The point isn't that voice replaces typing — it's that an AI pair programmer you can talk to removes friction at the exact moments the keyboard gets in the way. You still review every diff, still approve anything risky, and your code still never leaves your machine.

Voice Orchestrator — FAQ

What is AIDEN's voice orchestrator?

AIDEN's voice orchestrator — called "Talk to AIDEN" in the app — is a voice-controlled interface for its agentic IDE. You toggle voice mode with Cmd/Ctrl+Shift+V, hold Space to talk, and release to send. For quick in-app actions the voice model responds instantly; for real coding work it delegates to a background agent and reads the result back aloud.

Is it hands-free or always listening?

No. AIDEN's voice orchestrator is push-to-talk — there is no wake word and no always-on listening. You hold Space to speak and release to send. Once a coding task is delegated, though, you can step away from the keyboard while the background agent runs.

How does voice actually run my coding tasks?

When you ask for something heavier than a quick in-app action, the voice model calls delegate_task and spawns a background agent. AIDEN routes the work to the best worker — Codex CLI for coding, Claude CLI for general work, with the OpenAI API as a fallback. You watch progress in a top-bar HUD and hear the summary spoken aloud when it finishes.

Can it delete files or deploy without my permission?

No. Destructive or risky actions — deleting, sending, deploying, calendar or account changes — are auto-flagged and require your explicit approval. You tap Approve or Decline in the voice HUD before anything runs.

How many agents can voice run at once?

Today the voice orchestrator runs one delegated background agent at a time. AIDEN's broader kanban board runs many agents in parallel across separate git branches, but the voice delegation slot is single-agent for now.

What do I need to use voice, and does my code stay private?

You need macOS 12 or later and at least one of Claude Code or Codex CLI installed — the same as the rest of AIDEN. The voice layer uses OpenAI's Realtime API, so it needs an OpenAI key. Everything runs on your machine: voice uses your OpenAI key, delegated coding uses your already-installed CLIs, and your code stays local.

Related Guides

What Is an Agentic IDE?

The multi-agent development model voice plugs into

Engineering with AI Agents

Patterns and workflows for agentic development

Parallel Agents & Git Worktrees

Run multiple agents on isolated branches simultaneously

Spec-Driven AI Development

Why specs beat prompts for agentic workflows

Claude Code Orchestration

GUI and workspace layer on top of Claude Code CLI

AI Kanban for Developers

Manage agent stories on a visual board

Talk to your code — for free

Download AIDEN, hold Space, and delegate your first coding task by voice. Free tier — one project, unlimited agents, no credit card.

Download AIDEN — free See pricing

macOS 12+ · Requires Claude Code or Codex CLI · Voice uses OpenAI Realtime (OpenAI key)