S_S
ABOUT.ME> PORTFOLIORESUMECONTACT
$ whoami

ML and HPC engineer specializing in computer vision, C++, and GPU performance. I like working from the highest to lowest level of the ML stack.

// RESEARCH
plasma-lab.research
PLASMA LAB · UMASSACTIVE
GPU Bottleneck Diagnosis in Python Profilers

Most profilers show you that something is slow — they don't tell you why. Building a pipeline that uses CUPTI hardware counters to classify each line of Python code as compute-bound, memory-bound, or latency-bound, then overlays that data onto Scalene's per-line output with AI-generated fix suggestions.

CUPTICUDASCALENEROOFLINEPYTHON
> VIEW SOURCE ↗
// PROJECTS
matmul.cu
CUDA MATMUL
Register-tiled GEMM kernel achieving 3.18× speedup over cuBLAS on RTX 4070S via shared memory tiling and warp-level matrix primitives.
CUDAC++GPU
> VIEW SOURCE ↗
fusion-pass.cpp
LLVM FUSION PASS
Optimization pass for automatic GPU kernel fusion. Reduces memory bandwidth overhead in transformer inference by fusing elementwise ops at the IR level.
LLVMC++IRMLIR
> VIEW SOURCE ↗
workmark.proj
WORKMARK
CS × SMB marketplace with cryptographically verified work records. Connects skilled contractors with local businesses via attestation-backed reputation scores.
NEXT.JSPOSTGRESTYPESCRIPT
> VIEW SOURCE ↗
profiler.py
PIXEL PROFILER
Real-time GPU counter dashboard with per-kernel flame graphs. Python API wrapping CUPTI with a live ncurses TUI for interactive kernel analysis.
PYTHONCUPTITUINCURSES
> VIEW SOURCE ↗
NORMAL
~/shivansh-soni/portfolio
Senior · CS · UMass · 2026
shivanshsoni@umass.edu
ONLINE