Clone Any Voice from a 5-Second Clip

VoiceCast turns a short audio sample into a voice you can use for text-to-speech — in 16 languages,
with expressive emotions, through a desktop app, command line, or Python API.

Get Started in 60 Seconds →

Ever Needed a Specific Voice on Demand?

You're building an audiobook, a game, or a prototype — and the voice matters, but hiring voice talent for every iteration is slow and expensive.
You need multilingual narration but can't find a single voice that sounds natural across languages.
Existing TTS tools produce robotic, flat output that doesn't match the expressiveness you need — no laughs, no sighs, no personality.

VoiceCast solves all three. Record 5–30 seconds of any voice, and generate natural, expressive speech in that voice — instantly, locally, for free.

What VoiceCast Gives You

Any voice, cloned in seconds — Feed in a 5–30 second WAV sample and VoiceCast learns the voice. No training, no cloud upload, no waiting.
16 languages, one tool — English, Spanish, French, German, Chinese, Japanese, and 10 more. Switch languages without switching voices.
Expressive speech that sounds human — Add [laugh], [sigh], [gasp], and more with Chatterbox Turbo. Your cloned voice doesn't just talk — it performs.
Three ways to use it — A polished desktop GUI for quick tasks, a CLI for automation, and a Python API for integration into your own projects.
Runs on your machine — No API keys, no cloud dependencies, no per-word billing. Your voice data stays local.

Start Cloning Voices Now →

How It Works

Install — Clone the repo, create a virtual environment, and pip install -e . — that's it.
Pick a voice sample — Any clean 5–30 second audio clip of the voice you want to clone.
Choose your engine — Coqui XTTS v2 for multilingual quality, or Chatterbox for speed and expressiveness.
Generate speech — Type your text, hit generate, and get a WAV file in the cloned voice.

Engine	Languages	Speed	Best For
Coqui XTTS v2	16	Medium	Multilingual narration, production quality
Chatterbox Turbo	English	Fast	Rapid iteration, expressive speech with emotion tags
Chatterbox Standard	English	Medium	High-fidelity English output

Try It Yourself →

Get Started in 60 Seconds

# Clone the repository
git clone https://github.com/luongnv89/voice-cast.git
cd voicecast

# Create virtual environment
python3.10 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install
pip install -e .

Launch the GUI:

python voice_cloning_app.py

Or use the CLI:

python vcloner.py -i voice.wav -t "Hello world" -o output.wav

Or call the Python API:

from voice_cloner import VoiceCloner

cloner = VoiceCloner(speaker_wav="./voice-samples/speaker.wav")
cloner.say("Hello, this is my cloned voice!", save_audio=True, output_file="output.wav")

Add expressive speech with Chatterbox Turbo:

cloner.say("That's hilarious [laugh]! I can't believe it [gasp]!")

Supported tags: [laugh], [chuckle], [cough], [sigh], [gasp], [yawn]

FAQ

Is VoiceCast free? Yes. VoiceCast is MIT licensed — free for personal and commercial use, forever. See LICENSE.

Does it need a GPU? No. VoiceCast runs on CPU. An NVIDIA GPU with CUDA speeds up generation significantly, and Apple Silicon users can install the optional MLX backend for hardware acceleration.

What are the system requirements? Python 3.10+, 8GB RAM (16GB recommended). Optional: NVIDIA GPU with CUDA or Apple Silicon with MLX.

How does Coqui compare to Chatterbox? Coqui XTTS v2 supports 16 languages and produces high-quality multilingual output. Chatterbox is English-only but faster and supports expressive emotion tags. Use both — VoiceCast makes switching engines seamless.

Is my voice data sent to the cloud? No. Everything runs locally on your machine. No API keys, no cloud uploads, no telemetry.

Can I use this in production? Yes. VoiceCast provides a Python API designed for integration. See the API Reference for details.

How long does the voice sample need to be? 5–30 seconds of clean speech. Longer samples can improve quality, but even 5 seconds produces usable results.

Start Building with VoiceCast

VoiceCast puts voice cloning in your hands — no cloud, no cost, no restrictions. Clone voices for audiobooks, games, accessibility tools, creative projects, or anything else you can imagine.

MIT licensed. Runs locally. Works on Linux, macOS, and Windows.

Get Started in 60 Seconds →

Documentation

Document	Description
API Reference	Complete Python API documentation
CLI Reference	Command-line interface guide
GUI Guide	Desktop application user manual
Engines Guide	TTS engine comparison and parameters
Architecture	System design and patterns
Development	Contributing and setup guide
Troubleshooting	Common issues and solutions

System Requirements

Python 3.10+
8GB RAM (16GB recommended)
NVIDIA GPU with CUDA (optional, for faster processing)
Apple Silicon with MLX (optional, for hardware acceleration on Mac)

Optional: Install Chatterbox Engine

pip install -e ".[chatterbox]"

Optional: Install MLX Backend (Apple Silicon)

pip install -e ".[mlx]"

Acknowledgments

Coqui TTS — XTTS v2 model
Chatterbox — Fast TTS by Resemble AI
PyTorch — Deep learning framework

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
docs		docs
engines		engines
examples		examples
gui		gui
models		models
openspec/changes/integrate-chatterbox-tts		openspec/changes/integrate-chatterbox-tts
output-examples		output-examples
tests		tests
utils		utils
voice-samples		voice-samples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.backup.md		README.backup.md
README.md		README.md
convert_to_standard_wav.sh		convert_to_standard_wav.sh
icon.jpg		icon.jpg
logo-dark.svg		logo-dark.svg
logo-icon.svg		logo-icon.svg
logo.svg		logo.svg
pyproject.toml		pyproject.toml
tts_engine_base.py		tts_engine_base.py
tts_factory.py		tts_factory.py
vcloner.py		vcloner.py
voice_cloner.py		voice_cloner.py
voice_cloning_app.py		voice_cloning_app.py
voice_cloning_app.spec		voice_cloning_app.spec
voicecast-app.png		voicecast-app.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clone Any Voice from a 5-Second Clip

Ever Needed a Specific Voice on Demand?

What VoiceCast Gives You

How It Works

Get Started in 60 Seconds

FAQ

Start Building with VoiceCast

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clone Any Voice from a 5-Second Clip

Ever Needed a Specific Voice on Demand?

What VoiceCast Gives You

How It Works

Get Started in 60 Seconds

FAQ

Start Building with VoiceCast

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages