-
Amazon
- Seattle WA
- http://www.linkedin.com/in/shauheen
Stars
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A profiling and performance analysis tool for machine learning
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
lightweight, standalone C++ inference engine for Google's Gemma models.
A simplified and automated orchestration workflow to perform ML end-to-end (E2E) model tests and benchmarking on Cloud VMs across different frameworks.
pytorch-tpu / transformers
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
Generate snapshots and rankings of monthly committer and issue/PR activity
Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
A machine learning compiler for GPUs, CPUs, and ML accelerators
A C++ standalone library for machine learning
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry lead…
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
ronghanghu / xla
Forked from pytorch/xlaEnabling PyTorch on Google TPU
A performant and modular runtime for TensorFlow
A list of awesome compiler projects and papers for tensor computation and deep learning.
A list of awesome compiler projects and papers for tensor computation and deep learning.
Development repository for the Triton language and compiler
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Model parallel transformers in JAX and Haiku
Hummingbird compiles trained ML models into tensor computation for faster inference.
Google Cloud TPU Utilization Bar for Training Models
Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
PyTorch extensions for high performance and large scale training.

