Showing 163 open source projects for "space"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    ...The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. Quantized models require less memory and disk space and depending on the hardware can be processed more efficiently.
    Downloads: 357 This Week
    Last Update:
    See Project
  • 2
    ImageBind

    ImageBind

    ImageBind One Embedding Space to Bind Them All

    ImageBind is a multimodal embedding framework that learns a shared representation space across six modalities—images, text, audio, depth, thermal, and IMU (inertial motion) data—without requiring explicit pairwise training for every modality combination. Instead of aligning each pair independently, ImageBind uses image data as the central binding modality, aligning all other modalities to it so they can interoperate zero-shot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Coconut

    Coconut

    Training Large Language Model to Reason in a Continuous Latent Space

    Coconut is the official PyTorch implementation of the research paper “Training Large Language Models to Reason in a Continuous Latent Space.” The framework introduces a novel method for enhancing large language models (LLMs) with continuous latent reasoning steps, enabling them to generate and refine reasoning chains within a learned latent space rather than relying solely on discrete symbolic reasoning. It supports training across multiple reasoning paradigms—including standard Chain-of-Thought (CoT), no-thought, and hybrid configurations—using configurable training stages and latent representations. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given image—even without explicit training for that classification task. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    ...Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. The model handles bilingual input (e.g. English and Chinese) thanks to dual encoders, and supports end-to-end text-to-video generation without requiring external assets. Its training and generation pipeline includes techniques like flow-matching, full 3D attention for temporal consistency, and fine-tuning approaches (e.g. video-based DPO) to improve fidelity and reduce artifacts. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    vJEPA-2

    vJEPA-2

    PyTorch code and models for VJEPA2 self-supervised learning from video

    VJEPA2 is a next-generation self-supervised learning framework for video that extends the “predict in representation space” idea from i-JEPA to the temporal domain. Instead of reconstructing pixels, it predicts the missing high-level embeddings of masked space-time regions using a context encoder and a slowly updated target encoder. This objective encourages the model to learn semantics, motion, and long-range structure without the shortcuts that pixel-level losses can invite. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    Video Subtitle Remover is an AI-based application designed to remove hardcoded subtitles from videos and generate new files without the embedded text. Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. It allows users to define a specific subtitle region so that only text in that area is removed rather than modifying the entire frame. It can also automatically remove text throughout the whole video when a position is not specified. ...
    Downloads: 89 This Week
    Last Update:
    See Project
  • 8
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    UForm

    UForm

    Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

    UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features but often neglect fine-grained ones. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization

    Physical Symbolic Optimization (Φ-SO) - A symbolic optimization package built for physics. Symbolic regression module uses deep reinforcement learning to infer analytical physical laws that fit data points, searching in the space of functional forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    stable-diffusion-videos

    stable-diffusion-videos

    Create videos with Stable Diffusion

    Create videos with Stable Diffusion by exploring the latent space and morphing between text prompts. Try it yourself in Colab.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    JEPA

    JEPA

    PyTorch code and models for V-JEPA self-supervised learning from video

    ...The repository provides training recipes, data pipelines, and evaluation utilities for image JEPA variants and often includes ablations that illuminate which masking and architectural choices matter. Because the objective is non-autoregressive and operates in embedding space, JEPA tends to be compute-efficient and stable at scale. The approach has become a strong alternative to contrastive or pixel-reconstruction methods for representation learning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Video Diffusion - Pytorch

    Video Diffusion - Pytorch

    Implementation of Video Diffusion Models

    Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. It uses a special space-time factored U-net, extending generation from 2D images to 3D videos. 14k for difficult moving mnist (converging much faster and better than NUWA) - wip. Any new developments for text-to-video synthesis will be centralized at Imagen-pytorch. For conditioning on text, they derived text embeddings by first passing the tokenized text through BERT-large. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    ...The project supports a wide range of languages for translation, spanning major world languages (English, Spanish, French, German, Chinese, Arabic, etc.) and many regional or less widely spoken languages, making it suitable for broad internationalization. It offers multiple usage modes, including a Colab notebook for cloud-based experimentation, a Hugging Face Space demo for quick trials, and instructions.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 16
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 17
    LangKit

    LangKit

    An open-source toolkit for monitoring Language Learning Models (LLMs)

    ...Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    ESP32-CAM_MJPEG2SD

    ESP32-CAM_MJPEG2SD

    ESP32 Camera motion capture application to record JPEGs to SD card

    ...The AVI format allows recordings to replay at the correct frame rate on media players. If a microphone is installed then a WAV file is also created and stored in the AVI file. The ESP32 cannot support all of the features as it will run out of heap space. For better functionality and performance, use one of the new ESP32S3 camera boards, eg Freenove ESP32S3 Cam, and ESP32S3 XIAO Sense, but avoid no-name boards marked ESPS3 RE:1.0.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    Semantic Kernel

    Semantic Kernel

    Integrate cutting-edge LLM technology quickly and easily into your app

    Semantic Kernel is an open-source SDK that lets you easily combine AI services like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C# and Python. By doing so, you can create AI apps that combine the best of both worlds. To help developers build their own Copilot experiences on top of AI plugins, we have released Semantic Kernel, a lightweight open-source SDK that allows you to orchestrate AI plugins. With Semantic Kernel, you can leverage the same AI...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 20
    Stable Baselines3

    Stable Baselines3

    PyTorch version of Stable Baselines

    Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines. You can read a detailed presentation of Stable Baselines3 in the v1.0 blog post or our JMLR paper. These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    TorchIO

    TorchIO

    Medical imaging toolkit for deep learning

    ...These transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity (bias) or k-space motion artifacts. TorchIO is a Python package containing a set of tools to efficiently read, preprocess, sample, augment, and write 3D medical images in deep learning applications written in PyTorch, including intensity and spatial transforms for data augmentation and preprocessing. Transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Ant Design X

    Ant Design X

    Experimental Ant Design extensions for advanced UI patterns

    Ant Design X is an experimental extension project built around the Ant Design ecosystem, focusing on exploring advanced user interface patterns and next-generation component designs. It serves as a space for prototyping and validating ideas that go beyond the core Ant Design library, allowing developers to experiment with more complex interactions and layouts. Ant Design X emphasizes flexibility and extensibility, enabling developers to adapt components to modern application needs. It often includes higher-level abstractions and enhanced UI capabilities that are not yet part of the stable design system. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. Moreover, Step-Audio2 supports tool-calling and retrieval-augmented generation (RAG), allowing it to access external knowledge sources or audio/text databases, thus reducing hallucinations and improving coherence in complex dialogues.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies....
    Downloads: 19 This Week
    Last Update:
    See Project
  • 25
    zclaw

    zclaw

    Your personal AI assistant at all-in 888KiB

    ...It includes support for GPIO control, scheduled tasks, memory handling, and other embedded automation features that enable real-world device interaction. The architecture is optimized for efficiency, allowing the full assistant stack to run in under one megabyte of space. By targeting low-power hardware, zclaw explores the future of edge AI assistants that operate independently of large cloud systems. Overall, the project showcases how lightweight autonomous assistants can be embedded directly into IoT devices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB