Highlights
- Pro
Stars
ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.
Official training and inference code for VBVR (A Very Big Video Reasoning Suite)
[LightX2V](https://x2v.light-ai.top) integration for [OpenClaw](https://openclaw.ai) — image generation (t2i/i2i), video (t2v/i2v/s2v), TTS, and voice clone via cloud API.
[CVPR2026] ConsistCompose: Unified Multimodal Layout Control for Image Composition
[CVPR 2026] EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents
Reinforcement Learning Framework for Visual Generation
In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.
This is a collection of recent papers on reasoning in video generation models.
Agentic LaTeX Writer - Local-first editor for AI-assisted academic writing
[ICLR 2026] Official Code for "the Quest for Generalizable Motion Generation: Data, Model, and Evaluation"
NEO Series: Native Vision-Language Models from First Principles
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
[CVPR2026] Scaling Spatial Intelligence with Multimodal Foundation Models
An open-source evaluation toolkit to evaluate MLLMs on Spatial Intelligence using the EASI protocol
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
This is a framework for evaluating reasoning in foundational Video Models.
Speech2Motion is a real-time streaming system that converts speech input into synchronized 3D character animations. The system provides intelligent motion matching based on speech content, keywords…
Audio2Face is a real-time audio-to-face animation service that converts streaming audio input into synchronized facial animation data. The system uses advanced machine learning models to extract au…
Orchestrator is a real-time intelligent conversation system for building personalized multimodal AI interaction workflows, including speech recognition (ASR), text conversation (LLM), text-to-speec…
Open-source Autonomous 3D Characters on the Web
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
🎖️ A collection of badges for your projects README
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
ModelTC / Wan2.2-Lightning
Forked from Wan-Video/Wan2.2Wan2.2-Lightning: Speed up wan2.2 model with distillation





