Port of OpenAI's Whisper model in C/C++
ImageBind One Embedding Space to Bind Them All
Training Large Language Model to Reason in a Continuous Latent Space
CLIP, Predict the most relevant text snippet given an image
State-of-the-art (SoTA) text-to-video pre-trained model
PyTorch code and models for VJEPA2 self-supervised learning from video
AI tool that removes hardcoded subtitles and text from videos locally
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Physical Symbolic Optimization
Create videos with Stable Diffusion
PyTorch code and models for V-JEPA self-supervised learning from video
Topic Modelling for Humans
Implementation of Video Diffusion Models
Synchronized Translation for Videos
A Family of Open Sourced Music Foundation Models
An open-source toolkit for monitoring Language Learning Models (LLMs)
ESP32 Camera motion capture application to record JPEGs to SD card
Integrate cutting-edge LLM technology quickly and easily into your app
PyTorch version of Stable Baselines
Medical imaging toolkit for deep learning
Experimental Ant Design extensions for advanced UI patterns
Multi-modal large language model designed for audio understanding
InvokeAI is a leading creative engine for Stable Diffusion models
Your personal AI assistant at all-in 888KiB