Velocity

Intro

Velocity is a performance-oriented machine learning framework built from scratch in C++ with a focus on understanding how modern ML systems execute at the systems level. The project explores the design of a lightweight autograd engine, tensor operations, and execution graphs without relying on existing ML frameworks. The framework emphasizes low-level control, clear abstractions, and performance transparency, enabling experimentation with memory layout, execution order, and kernel fusion. Velocity is designed to be extensible, with a roadmap that includes GPU acceleration using CUDA, custom operators, and optimized kernels for common ML workloads. Velocity serves as a learning-driven systems project aimed at bridging the gap between ML theory and the underlying hardware and runtime mechanisms that power real-world machine learning frameworks.

2026

Next work

GPT-2 Kernel Fusion