Skip to content

lmsnation/voice-cast

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceCast Logo

GitHub Stars MIT License Python 3.10+ Version Issues

Clone Any Voice from a 5-Second Clip

VoiceCast turns a short audio sample into a voice you can use for text-to-speech — in 16 languages,
with expressive emotions, through a desktop app, command line, or Python API.

Get Started in 60 Seconds →


VoiceCast GUI

Ever Needed a Specific Voice on Demand?

  • You're building an audiobook, a game, or a prototype — and the voice matters, but hiring voice talent for every iteration is slow and expensive.
  • You need multilingual narration but can't find a single voice that sounds natural across languages.
  • Existing TTS tools produce robotic, flat output that doesn't match the expressiveness you need — no laughs, no sighs, no personality.

VoiceCast solves all three. Record 5–30 seconds of any voice, and generate natural, expressive speech in that voice — instantly, locally, for free.

What VoiceCast Gives You

  • Any voice, cloned in seconds — Feed in a 5–30 second WAV sample and VoiceCast learns the voice. No training, no cloud upload, no waiting.
  • 16 languages, one tool — English, Spanish, French, German, Chinese, Japanese, and 10 more. Switch languages without switching voices.
  • Expressive speech that sounds human — Add [laugh], [sigh], [gasp], and more with Chatterbox Turbo. Your cloned voice doesn't just talk — it performs.
  • Three ways to use it — A polished desktop GUI for quick tasks, a CLI for automation, and a Python API for integration into your own projects.
  • Runs on your machine — No API keys, no cloud dependencies, no per-word billing. Your voice data stays local.

Start Cloning Voices Now →

How It Works

  1. Install — Clone the repo, create a virtual environment, and pip install -e . — that's it.
  2. Pick a voice sample — Any clean 5–30 second audio clip of the voice you want to clone.
  3. Choose your engine — Coqui XTTS v2 for multilingual quality, or Chatterbox for speed and expressiveness.
  4. Generate speech — Type your text, hit generate, and get a WAV file in the cloned voice.
Engine Languages Speed Best For
Coqui XTTS v2 16 Medium Multilingual narration, production quality
Chatterbox Turbo English Fast Rapid iteration, expressive speech with emotion tags
Chatterbox Standard English Medium High-fidelity English output

Try It Yourself →

Get Started in 60 Seconds

# Clone the repository
git clone https://github.com/luongnv89/voice-cast.git
cd voicecast

# Create virtual environment
python3.10 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install
pip install -e .

Launch the GUI:

python voice_cloning_app.py

Or use the CLI:

python vcloner.py -i voice.wav -t "Hello world" -o output.wav

Or call the Python API:

from voice_cloner import VoiceCloner

cloner = VoiceCloner(speaker_wav="./voice-samples/speaker.wav")
cloner.say("Hello, this is my cloned voice!", save_audio=True, output_file="output.wav")

Add expressive speech with Chatterbox Turbo:

cloner.say("That's hilarious [laugh]! I can't believe it [gasp]!")

Supported tags: [laugh], [chuckle], [cough], [sigh], [gasp], [yawn]

FAQ

Is VoiceCast free? Yes. VoiceCast is MIT licensed — free for personal and commercial use, forever. See LICENSE.

Does it need a GPU? No. VoiceCast runs on CPU. An NVIDIA GPU with CUDA speeds up generation significantly, and Apple Silicon users can install the optional MLX backend for hardware acceleration.

What are the system requirements? Python 3.10+, 8GB RAM (16GB recommended). Optional: NVIDIA GPU with CUDA or Apple Silicon with MLX.

How does Coqui compare to Chatterbox? Coqui XTTS v2 supports 16 languages and produces high-quality multilingual output. Chatterbox is English-only but faster and supports expressive emotion tags. Use both — VoiceCast makes switching engines seamless.

Is my voice data sent to the cloud? No. Everything runs locally on your machine. No API keys, no cloud uploads, no telemetry.

Can I use this in production? Yes. VoiceCast provides a Python API designed for integration. See the API Reference for details.

How long does the voice sample need to be? 5–30 seconds of clean speech. Longer samples can improve quality, but even 5 seconds produces usable results.

Start Building with VoiceCast

VoiceCast puts voice cloning in your hands — no cloud, no cost, no restrictions. Clone voices for audiobooks, games, accessibility tools, creative projects, or anything else you can imagine.

MIT licensed. Runs locally. Works on Linux, macOS, and Windows.

Get Started in 60 Seconds →


Documentation
Document Description
API Reference Complete Python API documentation
CLI Reference Command-line interface guide
GUI Guide Desktop application user manual
Engines Guide TTS engine comparison and parameters
Architecture System design and patterns
Development Contributing and setup guide
Troubleshooting Common issues and solutions
System Requirements
  • Python 3.10+
  • 8GB RAM (16GB recommended)
  • NVIDIA GPU with CUDA (optional, for faster processing)
  • Apple Silicon with MLX (optional, for hardware acceleration on Mac)
Optional: Install Chatterbox Engine
pip install -e ".[chatterbox]"
Optional: Install MLX Backend (Apple Silicon)
pip install -e ".[mlx]"

Acknowledgments

About

Your words, any voice. Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.5%
  • Shell 0.5%