Skip to content

brhrmaster/jarvis-ai-assistant

Repository files navigation

Python Pygame Edge TTS gTTS RealtimeSTT Ollama Pydantic PyTorch langdetect aiohttp loguru

PyJarvis - AI Assistant

A Python implementation of the Jarvis text-to-voice assistant with animated digital face, LLM integration, and speech recognition.

UI

Architecture

Modular Python application following Clean Architecture, SOLID principles, and Design Patterns

UI

Components

  • pyjarvis_shared: Shared types, messages, and centralized configuration
  • pyjarvis_core: Core domain logic (text analysis, TTS processors, audio processing, animations)
  • pyjarvis_service: Service layer for text-audio processing (TCP IPC)
  • pyjarvis_cli: CLI application to send text to service
  • pyjarvis_ui: Desktop UI with animated robot face (Pygame)
  • pyjarvis_llama: LLM integration with speech recognition (Ollama + RealtimeSTT)

Components Relationship

Architecture

Requirements

  • Python 3.10+
  • FFmpeg (for MP3 processing) — see the "Install FFmpeg" section below
  • Optional: Ollama (for LLM features)
  • See requirements.txt for all dependencies

Installation

pip install -r requirements.txt

Quick Start

Full architecture (recommended)

  1. Start the service in terminal 1:
python -m pyjarvis_service
  1. Start the UI in terminal 2:
python -m pyjarvis_ui
  1. Send text from terminal 3:
python -m pyjarvis_cli "Hello, I am Jarvis"

You can also specify the language manually:

python -m pyjarvis_cli "Hola, soy Jarvis" --language es
python -m pyjarvis_cli "Olá, eu sou Jarvis" --language pt-BR

LLM + Speech Recognition (optional)

  1. Start Ollama (if using local LLM):
ollama serve
  1. Start the service and UI as above.
  2. Start the LLM CLI in another terminal:
python -m pyjarvis_llama

In the LLM CLI you can:

  • Type messages and press Enter for text input
  • Use /m to record audio from microphone (press Enter to stop)
  • Use /lang <code> to change recognition language (STT)
  • Use /persona <name> to change AI persona
  • Automatic language detection: The system automatically detects the language of LLM responses and uses the correct TTS voice

Standalone UI

The UI can run standalone but is much more useful connected to the service:

python -m pyjarvis_ui

Project Structure

Top-level layout (canonical):

pyJarvis/
├── README.md
├── requirements.txt
├── pyjarvis_shared/
├── pyjarvis_core/
├── pyjarvis_service/
├── pyjarvis_cli/
├── pyjarvis_ui/
├── pyjarvis_llama/
├── audio/
├── assets/
├── models/
└── docs/
  • pyjarvis_shared/: AppConfig, message types and shared utilities
  • pyjarvis_core/: TextAnalyzer, AnimationController, TTS factory and processors
  • pyjarvis_service/: IPC server and TextProcessor orchestration
  • pyjarvis_ui/: Pygame-based UI (FaceRenderer, AudioPlayer, ServiceClient)
  • pyjarvis_llama/: LLM CLI, Ollama client, STT recorder, personas

Features

  • Animated robot face with lip-sync and emotion-driven effects
  • Multiple TTS engines (Edge-TTS default, gTTS available)
  • Automatic language detection for TTS (Portuguese, English, Spanish) using langdetect with heuristic fallback
  • STT integration via RealtimeSTT / Whisper models
  • LLM support via Ollama and configurable AI personas
  • Multi-language support: Automatic detection and manual override for TTS language selection
  • TCP-based IPC; UI registers for broadcast updates from service

Configuration

All runtime configuration is centralized in pyjarvis_shared/config.py (AppConfig):

  • TTS processor selection (tts_processor)
  • Audio output directory and auto-delete behavior
  • Edge-TTS voice mapping (edge_tts_voices)
  • STT model and language
  • Ollama base URL and model
  • Language detection: Uses langdetect library with heuristic fallback for automatic language detection

Language Detection

PyJarvis includes automatic language detection for TTS voice selection:

  • Primary method: Uses langdetect library for accurate language detection
  • Fallback: Heuristic-based detection using language-specific patterns and keywords
  • Supported languages: Portuguese (pt-BR), English (en-US), Spanish (es-ES, es-MX, es-AR, etc.)
  • Manual override: You can specify the language manually using the --language flag in CLI

The system automatically detects the language of:

  • Text sent via CLI (if no language is specified)
  • LLM responses in the interactive CLI (automatically detected and sent to TTS with correct language code)

Language Codes

  • Portuguese: pt, pt-BR, portuguese
  • English: en, en-US, en-GB, english
  • Spanish: es, es-ES, es-MX, es-AR, spanish, español

Text-to-Speech (Quick notes)

  • Edge-TTS (Microsoft) is the default high-quality engine. Voice mapping is configurable by language.
  • gTTS (Google) is supported as a fallback (requires internet and FFmpeg for MP3→WAV conversion).
  • Automatic language detection ensures the correct voice is used for each language.

Install FFmpeg

PyJarvis uses FFmpeg (via pydub) to convert MP3 audio (e.g., produced by gTTS) to WAV.

Windows options:

  • Chocolatey (recommended):
choco install ffmpeg
  • winget:
winget install ffmpeg

Verify installation:

ffmpeg -version

If you prefer not to install FFmpeg, use a TTS engine that emits WAV directly.

Edge-TTS Voice Mapping

Configure voices in pyjarvis_shared/config.py (example):

from pyjarvis_shared import AppConfig
config = AppConfig()
config.edge_tts_voices = {
    "pt-br": "pt-BR-HumbertoNeural",
    "pt": "pt-BR-FranciscaNeural",
    "en": "en-US-AriaNeural",
    "en-us": "en-US-GuyNeural",
    "es": "es-ES-ElviraNeural",
    "es-es": "es-ES-ElviraNeural",
    "es-mx": "es-MX-DaliaNeural",
    "es-ar": "es-AR-ElenaNeural"
}

Available Voices

Portuguese (pt-BR) example voices

  • pt-BR-FranciscaNeural - female (padrão)
  • pt-BR-HumbertoNeural - male
  • pt-BR-AntonioNeural - male
  • pt-BR-BrendaNeural - female
  • pt-BR-DonatoNeural - male
  • pt-BR-ElzaNeural - female
  • pt-BR-FabioNeural - male
  • pt-BR-GiovannaNeural - female
  • pt-BR-JulioNeural - male
  • pt-BR-LeilaNeural - female
  • pt-BR-LeticiaNeural - female
  • pt-BR-ManuelaNeural - female
  • pt-BR-NicolauNeural - male
  • pt-BR-ThalitaNeural - female
  • pt-BR-ValerioNeural - male
  • pt-BR-YaraNeural - female

English (en-US) example voices

  • en-US-AriaNeural - female (padrão)
  • en-US-GuyNeural - male
  • en-US-JennyNeural - female
  • en-US-AmberNeural - female
  • en-US-AnaNeural - female (child)
  • en-US-AshleyNeural - female
  • en-US-BrandonNeural - male
  • en-US-ChristopherNeural - male
  • en-US-CoraNeural - female
  • en-US-ElizabethNeural - female
  • en-US-EricNeural - male
  • en-US-JacobNeural - male
  • en-US-JaneNeural - female
  • en-US-JasonNeural - male
  • en-US-MichelleNeural - female
  • en-US-MonicaNeural - female
  • en-US-NancyNeural - female
  • en-US-RogerNeural - male
  • en-US-SaraNeural - female
  • en-US-TonyNeural - male

Spanish (es-ES) example voices

  • es-ES-ElviraNeural - female (padrão, Bright, Clear)
  • es-ES-AlvaroNeural - male (Confident, Animated)
  • es-ES-AbrilNeural - female
  • es-ES-ArabellaMultilingualNeural - female (Cheerful, Friendly, Casual, Warm, Pleasant)
  • es-ES-ArnauNeural - male
  • es-ES-DarioNeural - male
  • es-ES-EliasNeural - male
  • es-ES-EstrellaNeural - female
  • es-ES-IreneNeural - female (Curious, Cheerful)
  • es-ES-IsidoraMultilingualNeural - female (Cheerful, Friendly, Warm, Casual)
  • es-ES-LaiaNeural - female
  • es-ES-LiaNeural - female (Animated, Bright)
  • es-ES-NilNeural - male
  • es-ES-SaulNeural - male
  • es-ES-TeoNeural - male
  • es-ES-TrianaNeural - female
  • es-ES-VeraNeural - female
  • es-ES-XimenaNeural - female

Spanish (es-MX) example voices

  • es-MX-DaliaNeural - female (padrão)
  • es-MX-DaliaMultilingualNeural - female
  • es-MX-BeatrizNeural - female
  • es-MX-CandelaNeural - female
  • es-MX-CarlotaNeural - female
  • es-MX-CecilioNeural - male
  • es-MX-GerardoNeural - male
  • es-MX-JorgeNeural - male
  • es-MX-JorgeMultilingualNeural - male
  • es-MX-LarissaNeural - female
  • es-MX-LibertoNeural - male
  • es-MX-LucianoNeural - male
  • es-MX-MarinaNeural - female
  • es-MX-NuriaNeural - female
  • es-MX-PelayoNeural - male
  • es-MX-RenataNeural - female
  • es-MX-YagoNeural - male

Spanish (es-AR) example voices

  • es-AR-ElenaNeural - female (Bright, Clear)
  • es-AR-TomasNeural - male

Spanish (es-CO) example voices

  • es-CO-SalomeNeural - female
  • es-CO-GonzaloNeural - male

Other Spanish regional voices

  • es-CL-CatalinaNeural - female (Chile)
  • es-CL-LorenzoNeural - male (Chile)
  • es-PE-AlexNeural - male (Peru)
  • es-PE-CamilaNeural - female (Peru)
  • es-US-AlonsoNeural - male (US Spanish)
  • es-US-PalomaNeural - female (US Spanish)
  • es-UY-MateoNeural - male (Uruguay)
  • es-UY-ValentinaNeural - female (Uruguay)
  • es-VE-PaolaNeural - female (Venezuela)
  • es-VE-SebastianNeural - male (Venezuela)

For a complete list of all available Spanish voices, run:

edge-tts --list-voices | grep "^es-"

Testing / Verification

Quick test flow:

  1. Start the service: python -m pyjarvis_service (should listen on 127.0.0.1:8888)
  2. Start the UI: python -m pyjarvis_ui (window should open and attempt to connect)
  3. Send text via CLI: python -m pyjarvis_cli "Hello, I am Jarvis"

Expected: service processes text, generates an audio file in ./audio/, broadcasts a VoiceProcessingUpdate, and the UI plays the audio while animating the face.

Testing checklist (manual):

  • Service starts without errors
  • UI connects to service
  • CLI can send text
  • Audio is generated and played
  • Robot face animates during speech
  • Audio files are cleaned up (if configured)
  • LLM CLI connects to Ollama (if used)
  • Speech recognition works (/m in LLM CLI)
  • Language detection works correctly (test with Portuguese, English, Spanish text)
  • Manual language override works (--language flag in CLI)

About

An AI assitant that listen talks with the user

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors