AI Infra Engineer · CUDA Developer · Open to Roles

Dante Xiang

Building

Focused on the systems that make large-scale AI fast — GPU kernel optimization, LLM inference acceleration, and distributed training infrastructure.

0+
LeetCode Problems
0+
CUDA Projects
0x
Inference Speedup
2026
Target Year

about

Who I Am

I'm an engineer obsessed with making AI models run faster and cheaper at scale. My focus is at the intersection of GPU programming, systems design, and ML infrastructure.

Currently diving deep into CUDA kernel optimization, memory hierarchy tuning, and LLM serving systems like vLLM and TensorRT-LLM. Actively targeting AI Infra / CUDA engineering roles.

When I'm not writing kernels, I'm grinding LeetCode or reading arxiv papers on distributed training and inference acceleration.

cat focus.txt
→ CUDA kernel optimization
→ LLM inference & serving
→ Distributed training systems
→ Memory bandwidth & hierarchy
→ GPU cluster scheduling
echo $STATUS
🟢 Open to opportunities
# target: AI Infra / CUDA roles 2026

skills

Tech Stack

GPU / CUDA
CUDA C++85%
Triton70%
cuBLAS / NCCL65%
Nsight Profiling75%
🧠
ML Infra
PyTorch90%
vLLM75%
TensorRT-LLM65%
DeepSpeed / Megatron60%
🛠️
Systems
C++80%
Python92%
Linux / Docker85%
Kubernetes65%

projects

What I'm Building

COMING SOON
CUDA Kernel Playground

Hand-written CUDA kernels for common ML ops — matrix multiply, Flash Attention, softmax. Benchmarked against cuBLAS with Nsight profiling.

CUDA C++cuBLASNsight
COMING SOON
🚀
LLM Inference Benchmark

Systematic benchmarking of vLLM, TensorRT-LLM, and LMDeploy across batch sizes and architectures. Latency vs throughput analysis.

vLLMTensorRT-LLMPython
COMING SOON
🧮
Distributed Training Lab

Experiments with tensor / pipeline / data parallelism using PyTorch + DeepSpeed. Scaling laws analysis on small models.

PyTorchDeepSpeedNCCL

mini game

CUDA Defender — destroy the GPU enemies

← → move SPACE shoot P pause
CUDA Defender
Destroy Cache Misses, OOM Errors, and Race Conditions
before they corrupt your training run.
SCORE: 0 LEVEL: 1 LIVES: ❤❤❤ HI: 0

contact

Let's Build Together

Open to AI Infra / CUDA engineering roles.
Let's talk about making models fast at scale.