$ whoami
ML and HPC engineer specializing in computer vision, C++, and GPU performance. I like working from the highest to lowest level of the ML stack.
// RESEARCH
// PROJECTS
matmul.cu
CUDA MATMUL
Register-tiled GEMM kernel achieving 3.18× speedup over cuBLAS on RTX 4070S via shared memory tiling and warp-level matrix primitives.
CUDAC++GPU
> VIEW SOURCE ↗fusion-pass.cpp
LLVM FUSION PASS
Optimization pass for automatic GPU kernel fusion. Reduces memory bandwidth overhead in transformer inference by fusing elementwise ops at the IR level.
LLVMC++IRMLIR
> VIEW SOURCE ↗workmark.proj
WORKMARK
CS × SMB marketplace with cryptographically verified work records. Connects skilled contractors with local businesses via attestation-backed reputation scores.
NEXT.JSPOSTGRESTYPESCRIPT
> VIEW SOURCE ↗profiler.py
PIXEL PROFILER
Real-time GPU counter dashboard with per-kernel flame graphs. Python API wrapping CUPTI with a live ncurses TUI for interactive kernel analysis.
PYTHONCUPTITUINCURSES
> VIEW SOURCE ↗