Double-clap. Jarvis wakes up, greets you with the weather and your tasks, answers your questions with dry British wit, controls your browser, and sees your screen.
Built entirely with Claude Code — no code written manually.
- Double-Clap Trigger — Clap twice and your entire workspace launches: Spotify, VS Code, Obsidian, Chrome with Jarvis UI
- Voice Conversation — Speak freely with Jarvis through your microphone. He listens, thinks, and responds with voice
- Sarcastic British Butler — Jarvis speaks German with the personality of Tony Stark's AI: dry, witty, and always one step ahead
- Weather & Tasks — On startup, Jarvis greets you with the current weather and a humorous summary of your open tasks from Obsidian
- Browser Automation — "Search for MiroFish" → Jarvis opens a real browser, navigates to the page, reads the content, and summarizes it for you
- Screen Vision — "What's on my screen?" → Jarvis takes a screenshot, analyzes it with Claude Vision, and describes what he sees
- World News — "What's happening in the world?" → Jarvis opens worldmonitor.app and summarizes current global events
- Window Snapping — All launched apps automatically snap into quadrants on your screen
You (speak) → Chrome Browser (Web Speech API) → FastAPI Server (local)
↓
Claude Haiku (thinks)
↓
┌──────────────────┼───────────────────┐
↓ ↓ ↓
ElevenLabs TTS Playwright Browser Screen Capture
(speaks back) (searches/opens) (Claude Vision)
↓
Audio → Chrome → You (hear)
| Component | Technology | Purpose |
|---|---|---|
| Speech Input | Web Speech API (Chrome) | Converts your voice to text |
| Server | FastAPI (Python) | Local orchestration — runs on your machine |
| Brain | Claude Haiku (Anthropic) | Thinks, decides, formulates responses |
| Voice | ElevenLabs TTS | Converts text to natural German speech |
| Browser Control | Playwright | Automates a real browser you can see |
| Screen Vision | Claude Vision + Pillow | Screenshots and describes your screen |
| Clap Detection | sounddevice + numpy | Listens for double-clap to launch everything |
| Window Management | PowerShell + Win32 API | Snaps windows into screen quadrants |
- Windows 10/11
- Python 3.10+
- Google Chrome
- Claude Code (recommended for setup)
| Service | What For | Cost | Link |
|---|---|---|---|
| Anthropic | Claude Haiku (the brain) | ~$0.25 / 1M tokens | console.anthropic.com |
| ElevenLabs | Voice (text-to-speech) | Free tier: 10k chars/month | elevenlabs.io |
-
Clone the repo:
git clone https://github.com/Julian-Ivanov/jarvis-voice-assistant.git cd jarvis-voice-assistant -
Open in VS Code, start Claude Code, and say:
Set up Jarvis for me. -
Claude Code will ask for your API keys, name, preferences, and configure everything automatically.
-
Clone and install dependencies:
git clone https://github.com/Julian-Ivanov/jarvis-voice-assistant.git cd jarvis-voice-assistant pip install -r requirements.txt playwright install chromium -
Create
config.jsonfrom the template:cp config.example.json config.json
-
Edit
config.jsonwith your API keys and preferences:{ "anthropic_api_key": "sk-ant-...", "elevenlabs_api_key": "sk_...", "elevenlabs_voice_id": "YOUR_VOICE_ID", "user_name": "Your Name", "user_address": "Sir", "city": "Hamburg", "workspace_path": "C:\\path\\to\\jarvis-voice-assistant", "spotify_track": "spotify:track:YOUR_TRACK_ID", "browser_url": "https://your-website.com", "obsidian_inbox_path": "C:\\path\\to\\obsidian\\inbox", "apps": ["obsidian://open"] } -
Start Jarvis:
python server.py
-
Open Chrome and go to
http://localhost:8340 -
Click anywhere on the page, then speak!
python server.pyThen open http://localhost:8340 in Chrome.
python scripts/clap-trigger.pyClap twice → Spotify plays your song, VS Code opens, Obsidian opens, Chrome opens with Jarvis. All windows snap into quadrants.
- Open Task Scheduler (
Win + R→taskschd.msc) - Create Task → Trigger: "At log on"
- Action:
powershellwith argument:-ExecutionPolicy Bypass -Command "python C:\path\to\scripts\clap-trigger.py"
| Command | What Happens |
|---|---|
| "Good morning, Jarvis" | Jarvis greets you with weather + tasks |
| "Search for AI news" | Opens browser, searches, summarizes results |
| "Open skool.com" | Opens the URL in your browser |
| "What's on my screen?" | Takes screenshot, describes what he sees |
| "What's happening in the world?" | Opens worldmonitor.app, summarizes global news |
| Any question | Jarvis answers in his sarcastic butler style |
jarvis-voice-assistant/
├── server.py # FastAPI backend — the brain
├── browser_tools.py # Playwright browser automation
├── screen_capture.py # Screenshot + Claude Vision
├── config.json # Your personal config (gitignored)
├── config.example.json # Template for new users
├── requirements.txt # Python dependencies
├── frontend/
│ ├── index.html # Jarvis web UI
│ ├── main.js # Speech recognition + WebSocket + audio
│ └── style.css # Dark theme with animated orb
├── scripts/
│ ├── clap-trigger.py # Double-clap detection
│ └── launch-session.ps1 # Launches all apps + window snapping
├── CLAUDE.md # Instructions for Claude Code
└── SETUP.md # Detailed setup guide
Edit the system prompt in server.py → build_system_prompt(). The personality, greeting behavior, and action instructions are all defined there.
Edit config.json:
{
"spotify_track": "spotify:track:YOUR_TRACK_ID",
"browser_url": "https://your-website.com",
"apps": ["obsidian://open", "slack://"]
}Find a voice on elevenlabs.io, copy the Voice ID, and set it in config.json:
{
"elevenlabs_voice_id": "YOUR_VOICE_ID"
}{
"city": "Berlin"
}In scripts/clap-trigger.py:
THRESHOLD = 0.15 # Lower = more sensitive
MAX_GAP = 1.2 # Seconds between claps| Problem | Solution |
|---|---|
| Jarvis doesn't speak | Check if server is running. Kill old process: taskkill /f /im python.exe then restart |
| "Connection lost" in browser | Old server still running on port 8340. Kill it and restart |
| Clap not detected | Lower THRESHOLD in clap-trigger.py (try 0.10) |
| Browser search fails | Run playwright install chromium |
| No audio in Chrome | Click anywhere on the page first (Chrome autoplay policy) |
| Jarvis says "Sir planen" instead of "Sie planen" | Update the system prompt grammar rules in server.py |
This template is built for Windows. If you're on macOS, clone the repo and tell Claude Code:
Convert this project to work on macOS.
Claude Code will adapt the PowerShell scripts to shell scripts, adjust paths, and handle macOS-specific differences.
- FastAPI — Python web framework for the local server
- Claude Haiku — Fast, affordable AI model (the brain)
- ElevenLabs — Natural text-to-speech (the voice)
- Playwright — Browser automation
- Web Speech API — Browser-native speech recognition
- sounddevice — Audio input for clap detection
Built by Julian with Claude Code.
Inspired by Iron Man's J.A.R.V.I.S. — "At your service, Sir."
MIT — use it, modify it, build on it. If you build something cool, let me know!