-
https://hishab.co
- Gulshan-2 House 4A, Rd 96, Dhaka 1212
- https://saiful9379.github.io
- https://huggingface.co/saiful9379
- in/saiful-islam-907128ba
-
Irodori-TTS Public
Forked from Aratako/Irodori-TTSA Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control
Python MIT License UpdatedFeb 27, 2026 -
personaplex Public
Forked from NVIDIA/personaplexPersonaPlex code.
Python MIT License UpdatedJan 16, 2026 -
ART Public
Forked from OpenPipe/ARTAgent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
Python Apache License 2.0 UpdatedSep 6, 2025 -
faster-whisper Public
Forked from SYSTRAN/faster-whisperFaster Whisper transcription with CTranslate2
Python MIT License UpdatedAug 16, 2025 -
auto-tuning-vllm Public
Forked from openshift-psap/auto-tuning-vllmPSAP auto-tuning for vllm (vllm+guidellm+optuna)
Python Apache License 2.0 UpdatedAug 12, 2025 -
S3Tokenizer Public
Forked from xingchensong/S3TokenizerReverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Python Apache License 2.0 UpdatedJun 16, 2025 -
ai-agents-for-beginners Public
Forked from microsoft/ai-agents-for-beginners11 Lessons to Get Started Building AI Agents
Jupyter Notebook MIT License UpdatedMay 21, 2025 -
-
sesame-explorations Public
Forked from thomwolf/sesame-explorationsPython Apache License 2.0 UpdatedApr 28, 2025 -
LLaSA_training Public
Forked from zhenye234/LLaSA_trainingLLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Python Other UpdatedApr 8, 2025 -
X-Codec-2.0 Public
Forked from zhenye234/X-Codec-2.0Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Python MIT License UpdatedMar 12, 2025 -
awesome-japanese-nlp-resources Public
Forked from taishi-i/awesome-japanese-nlp-resourcesA curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
Creative Commons Zero v1.0 Universal UpdatedFeb 17, 2025 -
Zonos Public
Forked from sardorb3k/ZonosZonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Python Apache License 2.0 UpdatedFeb 16, 2025 -
nanospeech Public
Forked from lucasnewman/nanospeechA simple, hackable text-to-speech system in PyTorch and MLX
Python MIT License UpdatedFeb 14, 2025 -
seed-vc Public
Forked from Plachtaa/seed-vczero-shot voice conversion & singing voice conversion, with real-time support
Python GNU General Public License v3.0 UpdatedFeb 14, 2025 -
-
-
-
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Python Apache License 2.0 UpdatedDec 18, 2024 -
avro.py Public
Forked from hitblast/avro.py⚡ A modern Pythonic implementation of Avro Phonetic.
Python MIT License UpdatedNov 27, 2024 -
mini-omni2 Public
Forked from gpt-omni/mini-omni2Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Python MIT License UpdatedNov 6, 2024 -
audioseal Public
Forked from facebookresearch/audiosealLocalized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Python MIT License UpdatedOct 26, 2024 -
NeMo-text-processing Public
Forked from NVIDIA/NeMo-text-processingNeMo text processing for ASR and TTS
Python Apache License 2.0 UpdatedOct 23, 2024 -
LLaMA-Omni Public
Forked from menon92/LLaMA-OmniLLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Python Apache License 2.0 UpdatedSep 24, 2024 -
BigVGAN Public
Forked from NVIDIA/BigVGANOfficial PyTorch implementation of BigVGAN (ICLR 2023)
Python MIT License UpdatedSep 5, 2024 -
-
descript-audio-codec Public
Forked from descriptinc/descript-audio-codecState-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Python MIT License UpdatedJul 11, 2024 -
xtts-finetune-tests Public
Forked from daswer123/xtts-finetune-testsIn this repository I will be running various experiments on finetune different parts for xtts
-
open-speech-corpora Public
Forked from coqui-ai/open-speech-corpora💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
MIT License UpdatedJun 6, 2024 -
resume_ai Public
Introducing Smart Resume AI: Revolutionizing Resume Sorting and Job Matching

